SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
Brazil's University Ranking: a Prediction Study with
Machine Learning
Sérgio Nicolau da Silva
Departamento Sistemas
Instituto Federal de Educação, Ciência e Tecnologia de Santa Catarina -
IFSC
Rua 15 de Julho, 150 - Coqueiros. Florianópolis, SC. Brazil. CEP
88070-010
Cleverson Tabajara Vianna *
Departamento de Saúde e Serviço
Instituto Federal de Educação, Ciência e Tecnologia de Santa Catarina -
IFSC
Av. Mauro Ramos, 950 - Centro. Florianópolis, SC. Brazil. CEP 88020-
300.
Fernando Alvaro Ostuni Gauthier
Departamento de Engenharia e Gestão do Conhecimento
Universidade Federal de Santa Catarina - UFSC
Campus Reitor João David Ferreira Lima, s/n - Trindade, Florianópolis
- SC. Brazil, CEP 88040-900
Antônio Pereira Cândido
Departamento de Saúde e Serviço
Instituto Federal de Educação, Ciência e Tecnologia de Santa Catarina -
IFSC
Av. Mauro Ramos, 950 - Centro. Florianópolis, SC. Brazil. CEP 88020-
300
* Corresponding author
Structured Abstract
How to distinguish the best or worst institutions of higher education? This is a question
that permeates the minds and hearts of parents, students, and teachers because education
is an investment in the personal and nation's future. As a source of information for the
response to asking, the University Ranking of Folha - RUF appears. Known for its
traditional evaluation, the Folha's Ranking is considered an independent evaluation tool
and provides a ranking of the best Brazilian universities. 74% of the data are related to
research areas and postgraduate programs. Who regulates and supervises the postgraduate
609
Proceedings IFKAD 2018
Delft, Netherlands, 4-6 July 2018
ISBN 978-88-96687-11-6
ISSN 2280787X
programs in Brazil is CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível
Superior), authorizing or not the program, assigning a score from 1 to 7, with 7 being the
best score. Your data for this evaluation is published. In this article, are using machine
learning techniques based on Naïve Bayes algorithms. CAPES data and the Folha's
Ranking of previous years are used as the training mass for the machine Naïve Bayes
algorithm. After the training, CAPES data from 2015 was applied to predict the 2016
Ranking with a hit rate of 61.5%. A percentage above 60 of the Folha's Ranking shows
that it is possible, with a more detailed study and analysis of the techniques, to predict
with a certain confidence. It should be noted that according to the Folha's Ranking roles,
the Scientific Research (mostly postgraduate) corresponds to a weight of 42% in the
ranking.
Purpose – The use of Machine Learning techniques to predict the Ranking Universitário
da Folha (RUF), using previous year's history to train the Naïve Bayes algorithm
Design/methodology/approach – Applied research, descriptive, exploratory objective
and qualitative and quantitative abortion. Data were extracted from the RUF, CAPES,
homogenized, and engineering methods were applied using several tools (WEKA, KDD,
Data Mining, Postgres, ETL Pentaho)
Originality/value – Use of machine learning techniques to gauge/predict the quality of
Higher Education, an index that is inserted in a complex and interdisciplinary context.
Practical implications – Proposes a statistical-based model to determine the quality of an
educational institution
Keywords – Brazil, University, Ranking, Naïve Baye, Machine Learning.
Paper Type: Academic Research Paper
1 Introduction
Quality of Higher Education, assessment, the ranking of the best Universities are
topics to be tackled in this article, however, even given the relevance of the theme, the
central point explored, is the use of mining algorithms to predict this ranking. However, it
is mandatory to contextualize the emergence and importance of these rankings.
The preliminary topic, which highlights the relevance of the theme, refers to the
quality of Higher Education.
Quality of education of Brazilian Universities has become a central theme for the
country. In the educational area, this term is not consolidated and is not a standard
ground. But for all practical purposes, the lack of understanding of the concept is not a
problem. Moreover, the idea of quality is not even put into the focus of discussion.
Together with the quality theme, questions about guarantee quality and accreditation
arise. (Sobrinho, 2008)
610
Proceedings IFKAD 2018
Delft, Netherlands, 4-6 July 2018
ISBN 978-88-96687-11-6
ISSN 2280787X
Since the 90's, most of the Latin American countries have set up their bodies for the
evaluation of education quality of universities (Sobrinho, 2006). In Brazil, the
accreditation, which in Brazil ultimately means "operating authorization", is a
governmental assignment regulated by the Sistema Nacional de Avaliação da Educação
Superior - SINAES - National System for the Evaluation of Higher Education (Southern
and Vessuri, 2006; Rish, 2001).
Since all Universities in Brazil must have an accreditation, or government
authorization to act, how to distinguish the best or the worst? This is a question that
pervades the minds and hearts of parents, students, and teachers, as education is an
investment in the personal and the future of the nation. There are some resources
available, but being governmental, have the same origin and do not evidence an
independence of evaluation as having the origin in the own society.
Precisely in this "information vacuum", the Ranking Universitário da Folha (Folha's
University Ranking) - RUF appears. Known for its traditional evaluation, the Folha's
Ranking is considered as an independent evaluation tool and provides a ranking of the
best Brazilian universities. The RUF is developed under the responsibility of Folha de
São Paulo (started in 1921), and use several mechanisms, aiming to rank the 195 best
universities in the country, public or private. Its execution is in charge of DATAFOLHA.
According to Folha de São Paulo (2016), in its own website we have: The RUF
evaluates the 195 Brazilian universities based on 5 indicators: Scientific research; Quality
of Teaching; Internationalization; Labour market; Innovation.
Data are obtained from a variety of sources, including two annual surveys,
encompassing thousands of respondents, and data are collected from such sources as:
a. Inep-MEC
b. Web of Science
c. SciELO
d. Inpi
e. FAPs
f. CNPq
g. Capes
h. Two Datafolha
surveys done
annually
611
Proceedings IFKAD 2018
Delft, Netherlands, 4-6 July 2018
ISBN 978-88-96687-11-6
ISSN 2280787X
The question that motivated us to this research is:
With what degree of certainty, by analysing only the data provided by CAPES,
concerning the data of graduate program of Universities, can we predict whether or not a
university will be in the RUF?
To do so, we use Knowledge Engineering to establish this ranking. We use tools and
techniques of Data Mining, Classification, Machine Learning and Recommendation and
Prediction and Probability Algorithms.
2 Theoretical Construction
Research in universities is usually associated with research groups, led by, in most,
Ph.Ds. Thus, it is plausible to hypothesize that the influence of the structure and
functioning of postgraduate programs is high in the RUF, even more than "research and
teaching quality" are relevant parts of the RUF, as presented in analyzing the construction
of the RUF and its structure. In this section, we look at how the RUF is built and the Data
Mining tools that will support the experiment.
We briefly describe what the RUF is and how it is composed.
2.1 Structure of the RUF and the open data of CAPES
When analyzing the structure of formation of the RUF, we have that 74% of the data
turn directly to Scientific Research (42%) and Quality of Education (32%), with the other
topics Labour Market, Internationalization, and Innovation, if sum together represents the
remaining 26%. In view of this, with a predominance of data related to research and as
research is generally attributed to the postgraduate program, we came to the perception
that although the ranking is aimed at undergraduate, the data of the Coordenação de
Aperfeiçoamento de Pessoal de Nível Superior (Coordination of Improvement of Level
Personnel) - CAPES, could have important weight for the ranking.
For Gonçalves (2006) several basic approaches to statistics are proposed for machine
prediction and learning, which use clustering algorithms to establish patterns: k-means
and Bayesians are examples.
Bayes was an 18th century English philosopher who expounded his theory of
probability in 1763. The rule that bears his name has been a cornerstone of probability
theory ever since. The difficulty with applying Bayes rule in practice is the attribution of
prior probabilities (Witten and Frank, 2011).
In this research, the Naive Bayes algorithm was used, with the Supervised Learning
approach, which is based on probabilistic methods (Fulmari and Chandak, 2014). The
CAPES open data of 2014 and RUF 2015 were used as the training and the prediction.
Based on this, of RUF-2016 was predicted using CAPES's open data of 2015. The RUF
prediction (by the algorithm) for the 2016 year was then compared with the results
612
Proceedings IFKAD 2018
Delft, Netherlands, 4-6 July 2018
ISBN 978-88-96687-11-6
ISSN 2280787X
published by the RUF. Also through the algorithm J48, it was tried to establish a decision
tree, but that due to the high number of branches is not feasible and with the "pruning",
becomes insignificant. As a tool, we use WEKA.
2.2 Discovery of Knowledge
Knowledge discovery is the process applied to structured, semi-structured and
unstructured data, with the purpose of verifying the hypothesis of users or the discovery
of new patterns. It can be further subdivided into two other objectives: the prediction of
future behavior based on the analysis of historical data and the presentation of patterns
identified in the data analysis (Fayyad and PIatetskY-Shapiro and SMYTH, 1996).
Knowledge Discovery in Database (KDD) is the area that has mechanisms and
techniques for structured data analysis. KDD can be seen as a multidisciplinary activity,
as it encompasses techniques and beyond the scope of the discipline, such as machine
learning (Fayyad, Piatetsky-Shapiro and Smyth, 1996). As part of KDD, Data Mining acts
on extracting useful database information.
2.2 Data mining
Nowadays, world the volume of digital data stored in electronic repositories grows at
a fast pace, making a major migration of software companies to act on big data
technologies and open data, according to studies published in 2015 from IDC: Worldwide
Technology Big Data and Forecast Services, 2015-2019 (IDC # 259532) and the
Worldwide Big Data Forecast by Vertical Market, 2014-2019 (IDC # US40544915).
The term Data Mining has been used by statisticians, data analysts and communities
of information systems in the management area and most popularly directly related to
database (Fayyad, Piatetsky-Shapiro and SMYTH, 1996). Such a data mining process is
supported by techniques that act in the training and testing based on historical data, thus
recognizing patterns. This method is characteristic of machine-learning techniques for the
recognition of patterns such as classification, clustering, clustering, among others
(Fayyad, PIatetsky-Shapiro and Smyth, 1996).
Each of the techniques has a variety of possible algorithms available and their
variations. For the purpose of this article will be approached the classification by means
of the Naïve-Bayes algorithm.
2.3 Classification
Classification is a process that we are constantly carrying out throughout history and
in our daily. We classify the transportation facilities by air, land, and sea, people of legal
age and minors legal age and the economic classes of the population are some examples.
613
Proceedings IFKAD 2018
Delft, Netherlands, 4-6 July 2018
ISBN 978-88-96687-11-6
ISSN 2280787X
The classification process consists of examining the characteristics of a certain object
to be classified and assigning it one or more classes (Linoff and Berry, 2011). When the
age of a person is presented, for example, by applying the current majority rule, it is
possible to classify the individual as a major or minor.
In data mining, the objects to be sorted are usually represented by records in a
database table or a file, in which a column that represents your class is added. The task of
classification is characterized by a definition of distinct classes that are identified from a
training set composed of pre-classified examples (Linoff and Berry, 2011).
Classification alone is not enough in complex cases for automated decision making,
but it is an excellent guideline for decision-making in intensive knowledge activities.
Thus, in seeking to identify the risk of the client in fulfilling its obligations, the technique
seeks to predict the future. For example, based on past experience, you can establish in a
financial institution which risks/confidence to receive loans.
"Any of the techniques used for classification and estimation can be adapted for
use in prediction by using training examples where the value of the variable to be
predicted is already known, along with historical data for those examples. The historical
data is used to build a model that explains the current observed behavior. When this
model is applied to current inputs, the result is a prediction of future behavior" (Linoff
and Berry, 2011).
In this direction, the classification for the case of study in question is applied.
There are numerous algorithms for classification of information. ID3 and C4.5 are
some examples of classification algorithms, which use a symbolic approach1
. Other
algorithms like Naïve Bayes and K-Neighboards have a statistical2
approach and several
implementations. The WEKA3
For the case study that foresees case-based prediction, Naïve Bayes algorithm was
used. "Naïve Bayes is a popular technique for this application because it is very fast and
quite accurate" (Witten, Frank and Hall, 2011). The Naïve Bayes algorithm is quite
effective when applied in data sets and combined with selection procedures and
eliminating redundancies.
software is an example of software that implements
several algorithms related to data mining and extraction of knowledge.
Naive Bayes-based algorithms that calculate explicit probabilities for hypotheses are
among the most practical approaches to certain types of learning problems. Research has
shown that the Naive-Bayes classifier can overcome the performance of decision tree-
based algorithms and even neural networks (Mitchell, 1997).
1
ranks based on decision trees as "if is de sunny day then will not rain"
2
verify the probability of an event occurring
3
available at http://www.cs.waikato.ac.nz/ml/weka/
614
Proceedings IFKAD 2018
Delft, Netherlands, 4-6 July 2018
ISBN 978-88-96687-11-6
ISSN 2280787X
3 Methodology
The methodological classification of this research characterizes it as applied since it
produces immediate results, however, it is also basic to serve as the basis for other
research (Marconi and Lakatos, 2010). As for the objectives, it is descriptive, insofar as it
describes characteristics of a phenomenon and establishes relations between variables. In
seeking to establish limits, and approaches for new research, delimiting an unknown area,
it is also characterized as having an exploratory objective. It also presents an explanatory
objective, since it "deepens the knowledge of reality because it explains the reason, the
reason of things". (Gil, 2002). It has a qualitative approach, as researchers attribute
meanings to the data; on the other hand, it is quantitative because it follows the statistical
rigors, not only using samples but of the whole universe that involves Universities. We
used bibliographic, documentary and experimental procedures (Gil, 2008).
The research itself followed the following steps:
1. Get the data from the CAPES open data for the years 2014 and 2015, regarding
students, teachers, and courses. These data were processed and prepared, composing
a relational database. Next, the RUF data of 2015 and 2016 were obtained and were
treated and loaded into relational database tables.
2. We then need to mine the data, preparing a correlation and conversion table,
matching the University initials of both systems (CAPES and RUF). This was an
exhaustive task that even presented 2 incompatibilities that were not solved and that
are part of the general analysis of the data.
3. The CAPES data were then summarized, including Masters and Ph.D. courses of
each University, number of teachers, and final students. The predictors were each of
these summarized fields, and the decision obtained was whether or not it belonged to
the RUF, thus making compatible RUF and CAPES data.
4. Following the concepts of machine learning, we use the data from 2014 as "test",
training the machine. To do so, we use the WEKA4
5. Next, we submit the 2015 data, in order to establish the prediction, of which
Universities would be in RUF 2016 and compare it with the actual result.
Software, where we apply the
Naïve Bayes algorithm.
6. These data were then compared, and a confusion matrix was established, indicating
both false positives and negatives. The results are interesting because with only open
data a significant result was obtained, not requiring surveys, interviews, and other
data not open (such as quantity of publications, quotations, among others).
4
WEKA is an open source software, produced at the University of Waikatu (NZ) and is a collection of machine
learning algorithms for data mining tasks.
615
Proceedings IFKAD 2018
Delft, Netherlands, 4-6 July 2018
ISBN 978-88-96687-11-6
ISSN 2280787X
4 The experiment: Open data CAPES and RUF
The first step is to collect the raw data from both the RUF and the CAPEs.
From the RUF were collected the data of the ranking of 2015 and 2016 of the site and
generated a file in format CSV, that file contains all the data of RUF, adding the year of
reference.
Figure 1 - RUF as it is presented in Folha's website
Next, the data were standardized, ie they were prepared so that they could be handled
by the software tools.
The CAPE’s open data is provided in CSV format, which is a more suitable format for
processing the data relative to the RUF which is a web page. Because it is in CSV, the
data process is simpler than that applied to the RUF.
From the CAPE site of open data, the following files were downloaded for
postgraduate programs for the years 2014 and 2015: courses, teachers, and students
undergraduate.
With the raw data, we import them into a PostgreSQL database to facilitate the
process of data normalization and extraction in the format expected by WEKA software.
Although in CSV standard, this does not mean that your data is sanitized5
. For this
process of sanitization and import to the database, an Extract, Transform and Load (ETL)
tool used in the KDD process, more specifically in the preprocessing phase of the data.
The tool chosen is Pentaho's Data-integration6
(Figure 2).
5
process that standardizes the data, maintaining its validity
6
can be downloaded of the Pentaho Community in http://community.pentaho.com/projects/data-integration/
616
Proceedings IFKAD 2018
Delft, Netherlands, 4-6 July 2018
ISBN 978-88-96687-11-6
ISSN 2280787X
Source: Tools utilized
Figure 2 - Example ETL applied to RUF (left) and WEKA interface with CSV file import (dir)
With the Data-integration tool, all data of interest to the search was imported. Even
though both sources of data deal with the same domain - universities - there are
divergences between the initial of institutions between databases. Even with sanitized,
approximately 50 universities that are part of the RUF were not located in the CAPE data.
To minimize this difference, a manual analysis of the data was required.
O was provided for the learning of the machine in the case under study, it is precisely
the union between the CAPES data of 2014 and RUF 2015, it is called "training mass",
later to move to the algorithm already "trained", the new CAPE data and the same classify
and make the predictions based on the knowledge perceived in the training. The data used
were from CAPES 2014, obtaining UF and university initials compatible with RUF.
The training file was then imported into the WEKA, via a graphical interface, to apply
the Naïve Bayes algorithm and precision analysis (Figure 2).
Several analyses and tests were performed to identify a configuration with the best
possible result. It is important to emphasize that this is an extremely important activity for
the process of knowledge extraction and that it is linked to the required interdisciplinarity,
where an expert in the subject contributes to these adjustments.
Applying the algorithm to the training data, the result was a 78.95% success rate. The
confusion matrix generated is as follows:
a b <- Classified as
191 33 a = N
51 124 b = S
With this level of precision, training data were exported via WEKA to the ARFF
format.
617
Proceedings IFKAD 2018
Delft, Netherlands, 4-6 July 2018
ISBN 978-88-96687-11-6
ISSN 2280787X
After the training, we then have to predict the RUF for 2015. For this, the ARFF file
was generated with the CAPE data for classification by the algorithm learned by the
machine with the data of 2015. Again the ARFF will contain the same columns, however,
the decision will contain "?", indicating to the algorithm to predict:
Using a terminal7
As result, WEKA presents the classification performed for each instance of the file to
be classified. Figure 3 shows the partial result of WEKA processing.
in the OS X operating system, the following command was
executed to determine to WEKA to perform the classification based on the training data:
java -cp weka.jar weka.classifiers.bayes.NaiveBayes -t training.arff -T sort.arff -p 3-8 –D.
=== Predictions on test data === inst# actual predicted error
prediction (cursos_mestrado, cursos_doutorado, formados_mestrado,
formados_doutorado, docentes_mestres, docentes_doutores)
In
st#
01 1
:?
1
:N
0.
996
(0,2,0,88,22,2)
02 1
:?
1
:N
0.
994
(0,2,0,169,28,0)
03 1
:?
1
:N
0.
988
(0,3,0,75,95,0)
… … … … …
… … … … …
23 1
:?
1
:N
0.
998
(1,0,0,0,40,0)
Source: WEKA output
Figure 3 - Data obtained from Weka
The last predicted column presents the prediction of each entry (data in parentheses),
thus informing that an institution with those characteristics of courses, teachers, and
students tends to be part of the RUF ranking. The result of the prediction was normalized
and imported into the PostgreSQL database. After that, RUF2016 was compared with the
result of the predictions, reaching the following result: of the 195 institutions that make
up the RUF 2016, 120 were predicted by the Naïve Bayes process with a 61.5% success.
7
also known as command line or shell, allows the user to ask the operating system to perform some actions such
as listing files, creating directories, running an application, among others
618
Proceedings IFKAD 2018
Delft, Netherlands, 4-6 July 2018
ISBN 978-88-96687-11-6
ISSN 2280787X
5 Conclusions
Emphasize the importance of objective criteria for institutional evaluation when we
assert that objective criteria and procedures that prioritize quantitative and comparable
aspects are required (Sobrinho and Ristoff, 2005).
The publication of linked open data expands this comparison process, allowing human
and non-human agents to process and analyze information. Berners-Lee (1989) suggests
that the data are open, especially those that can be classified with 5 stars in the future.
Classification Description
Available on the web (whatever format) but with an open licence, to be Open
Data
Available as machine-readable structured data (e.g. excel instead of image scan
of a table)
as (2) plus non-proprietary format (e.g. CSV instead of excel)
All the above plus, Use open standards from W3C (RDF and SPARQL) to
identify things, so that people can point at your stuff
All the above, plus: Link your data to other people’s data to provide context
References: Berners-Lee (1989, 2006)
The act of measuring although it is a part of the evaluation process of the society on
Universities, can not be considered in isolation (Vianna, 2014):
The evaluation will express the actions, attitudes, and values of both individuals and
communities or the science itself; if possible it should contemplate its multiple
dimensions and interrelationships. It will always produce effects over time, be they
political or pedagogical. An important part of the evaluation refers to the tests
applied, the questionnaires to be answered and the results obtained - this is what is
called the technical part of the evaluation; therefore, measurement is part of the
evaluation, but the evaluation is not exhausted in the measurement. This means that it
is not enough to assign notes, weights, and concepts.
A percentage beyond 60% of the RUF ranking shows that it is possible, with a more
detailed study and analysis of the techniques, to predict with a certain degree of
confidence. It should be noted that, according to the RUF, the Scientific Research (mostly
postgraduate) corresponds to a 42% weight in the ranking.
Another hypothesis is to make a cut, selecting the first 60 universities. Thus, an
algorithm to predict the 40, 50 or 60 best Brazilian universities, based strictly on open
CAPE data, may present a higher degree of confidence.
It is also observed that there are positive reflexes (above 60%) of the CAPES
processes on the quality management of the Postgraduate Programs of Universities,
intrinsically linked to the quality of higher education.
619
Proceedings IFKAD 2018
Delft, Netherlands, 4-6 July 2018
ISBN 978-88-96687-11-6
ISSN 2280787X
References
Berners-Lee, T. (1989) Information management: A proposal.
Witten, I. H. and Frank, E. (2011) Data Mining: Practical machine learning tools and techniques, ed
Morgan Kaufmann
Sobrinho, J. D. and Vessuro, H. (2008) Quality, evoluation: from sinaes to indexes, In Avaliação da
Educação Superior Magazine.
Fulmari, A. and Chandak, M. B. (2014) An approch for word sense disambiguation using modified
naïve bayes classifier, In International Journal of Innovative Research in Computer and
Communication Engineering.
Vianna, C. T. (2014) Avaliação institucional e o desafio da cultura da autoavaliação e cpa, In
conference's publications of regional seminar about institutional self-evaluation and
evaluations committees
Fayyad, U., Piatetsky-Shapiro, G. and Smyth P. (1996) From data mining to knowledge discovery
in databases. AI magazine, Vol. 17, No. 3, p. 37
Fulmari, A. and Chandak, M. B. (2014) An approach for word sense disambiguation using modified
naïve bayes classifier. International Journal of Innovative Research in Computer and
Communication Engineering, Vol. 2
Gil, A. C. (2002) Como elaborar projetos de pesquisa. São Paulo, Vol. 5
Gil, A. C. (2008) Métodos e técnicas de pesquisa social. In: Métodos e técnicas de pesquisa social.
Atlas
Gonçalves, A. L. (2006) Um modelo de descoberta de conhecimento baseado na correlação de
elementos textuais e expansão vetorial aplicado à engenharia e gestão do conhecimento. 196 f.
Tese (Doutorado) — Tese (Doutorado em Engenharia de Produção)-Programa de Pós-
Graduação em Engenharia de Produção, Universidade Federal de Santa Catarina, Florianópolis
Linoff, G. S. and Berry M. J. (2011) Data mining techniques: for marketing, sales, and customer
relationship management. John Wiley & Sons
Marconi, M. d. A. and Lakatos, E. M. (2010) Fundamentos de metodologia científica. In:
Fundamentos de metodologia científica. ed Atlas
Mitchell, T. M. (1997) Machine learning. New York
Rish I. (2001) An empirical study of the naive bayes classifier. In: IBM NEW YORK. IJCAI 2001
workshop on empirical methods in artificial intelligence. Vol. 3, No. 22, pp. 41–46.
Sobrinho, J. D. (2006) Acreditación de la educación superior en américa latina y el caribe. In:
TRES, J.; SANYK, B. C. (Ed.). La educación superior en el Mundo 2007. Acreditación para la
garantía de la calidad: ¿Qué está en juego? Global University Network for Innovation
Sobrinho, J. D. and Ristoff, D. I. (2005) Avaliação como instrumento da formação cidadã e do
desenvolvimento da sociedade democrática: por uma ético-epistemologia da avaliação. Ristoff,
Dilvo &amp; Almeida JR, Vicente (organizadores). Avaliação Participativa, Perspectivas e
Debates, série Educação Superior em Debate, No. 1, pp. 15–38
Sobrinho, J. D. and Vessuri, H. (2006) Paradigmas e políticas de avaliação da educação superior.
autonomia e heteronomia. Universidad e investigación científica: convergências y tensiones.
Vessuri H, org. Buenos Aires: CLACSO, Consejo Latinoamericano de Ciencias Sociales, pp.
169–191
620
Proceedings IFKAD 2018
Delft, Netherlands, 4-6 July 2018
ISBN 978-88-96687-11-6
ISSN 2280787X

Weitere ähnliche Inhalte

Ähnlich wie Brazil's university ranking a prediction study with machine learning 234 ifkad2018

Ifkad id 234 brazil's university ranking a prediction study with machine lea...
Ifkad id 234 brazil's university ranking  a prediction study with machine lea...Ifkad id 234 brazil's university ranking  a prediction study with machine lea...
Ifkad id 234 brazil's university ranking a prediction study with machine lea...IFSC
 
Using student data to transform teaching and learning
Using student data to transform teaching and learningUsing student data to transform teaching and learning
Using student data to transform teaching and learningBart Rienties
 
Using Learning analytics to support learners and teachers at the Open University
Using Learning analytics to support learners and teachers at the Open UniversityUsing Learning analytics to support learners and teachers at the Open University
Using Learning analytics to support learners and teachers at the Open UniversityBart Rienties
 
nirf_booklet_FINAL_02_04_16_01-00PM
nirf_booklet_FINAL_02_04_16_01-00PMnirf_booklet_FINAL_02_04_16_01-00PM
nirf_booklet_FINAL_02_04_16_01-00PMbinayakadhikary
 
Secondary Analysis Of Qualitative Data
Secondary Analysis Of Qualitative DataSecondary Analysis Of Qualitative Data
Secondary Analysis Of Qualitative DataDeborah Gastineau
 
Prospect for learning analytics to achieve adaptive learning model
Prospect for learning analytics to achieve adaptive learning modelProspect for learning analytics to achieve adaptive learning model
Prospect for learning analytics to achieve adaptive learning modelOpen Cyber University of Korea
 
The impacts of scientific research fund (SRF) on improving research building ...
The impacts of scientific research fund (SRF) on improving research building ...The impacts of scientific research fund (SRF) on improving research building ...
The impacts of scientific research fund (SRF) on improving research building ...Nasser Elgizawy
 
Exploring scholarship and scholarly activity in college-based Higher Education
Exploring scholarship and scholarly activity in college-based Higher EducationExploring scholarship and scholarly activity in college-based Higher Education
Exploring scholarship and scholarly activity in college-based Higher EducationThe Education and Training Foundation
 
Discovering Student Dropout Prediction through Deep Learning
Discovering Student Dropout Prediction through Deep LearningDiscovering Student Dropout Prediction through Deep Learning
Discovering Student Dropout Prediction through Deep Learningijtsrd
 
MD8Assgn: A8: Course Project—Program Proposal
MD8Assgn: A8: Course Project—Program ProposalMD8Assgn: A8: Course Project—Program Proposal
MD8Assgn: A8: Course Project—Program Proposaleckchela
 
EDUC 8103-6: A3: Program Proposal, Section 1 Needs Assessment Plan
EDUC 8103-6: A3: Program Proposal, Section 1 Needs Assessment PlanEDUC 8103-6: A3: Program Proposal, Section 1 Needs Assessment Plan
EDUC 8103-6: A3: Program Proposal, Section 1 Needs Assessment Planeckchela
 
20_05_08 «Learning Analytics en la Open University y en el Reino Unido».
20_05_08  «Learning Analytics en la Open University y en el Reino Unido».20_05_08  «Learning Analytics en la Open University y en el Reino Unido».
20_05_08 «Learning Analytics en la Open University y en el Reino Unido».eMadrid network
 
«Learning Analytics at the Open University and the UK»
 «Learning Analytics at the Open University and the UK» «Learning Analytics at the Open University and the UK»
«Learning Analytics at the Open University and the UK»Bart Rienties
 
OER LEARNING DESIGN GUIDELINES FOR BRAZILIAN K-12 TEACHERS SUPPORTING THE DEV...
OER LEARNING DESIGN GUIDELINES FOR BRAZILIAN K-12 TEACHERS SUPPORTING THE DEV...OER LEARNING DESIGN GUIDELINES FOR BRAZILIAN K-12 TEACHERS SUPPORTING THE DEV...
OER LEARNING DESIGN GUIDELINES FOR BRAZILIAN K-12 TEACHERS SUPPORTING THE DEV...Global OER Graduate Network
 
A LEARNING ANALYTICS APPROACH FOR STUDENT PERFORMANCE ASSESSMENT
A LEARNING ANALYTICS APPROACH FOR STUDENT PERFORMANCE ASSESSMENTA LEARNING ANALYTICS APPROACH FOR STUDENT PERFORMANCE ASSESSMENT
A LEARNING ANALYTICS APPROACH FOR STUDENT PERFORMANCE ASSESSMENTTye Rausch
 
A LEARNING ANALYTICS APPROACH FOR STUDENT PERFORMANCE ASSESSMENT
A LEARNING ANALYTICS APPROACH FOR STUDENT PERFORMANCE ASSESSMENTA LEARNING ANALYTICS APPROACH FOR STUDENT PERFORMANCE ASSESSMENT
A LEARNING ANALYTICS APPROACH FOR STUDENT PERFORMANCE ASSESSMENTAIRCC Publishing Corporation
 
Mapping knowledge produced on problem based learning between 1945 and 2014 - ...
Mapping knowledge produced on problem based learning between 1945 and 2014 - ...Mapping knowledge produced on problem based learning between 1945 and 2014 - ...
Mapping knowledge produced on problem based learning between 1945 and 2014 - ...racheltrans
 
Assessing Students' Information Literacy Skills Using MAP-Works
Assessing Students' Information Literacy Skills Using MAP-WorksAssessing Students' Information Literacy Skills Using MAP-Works
Assessing Students' Information Literacy Skills Using MAP-WorksMillstein Library
 
Smartphone, PLC Control, Bluetooth, Android, Arduino.
Smartphone, PLC Control, Bluetooth, Android, Arduino. Smartphone, PLC Control, Bluetooth, Android, Arduino.
Smartphone, PLC Control, Bluetooth, Android, Arduino. ijcsit
 
Running Head DESCRIPTIVE STATISTICS COMPUTING .docx
Running Head DESCRIPTIVE STATISTICS COMPUTING                    .docxRunning Head DESCRIPTIVE STATISTICS COMPUTING                    .docx
Running Head DESCRIPTIVE STATISTICS COMPUTING .docxtodd271
 

Ähnlich wie Brazil's university ranking a prediction study with machine learning 234 ifkad2018 (20)

Ifkad id 234 brazil's university ranking a prediction study with machine lea...
Ifkad id 234 brazil's university ranking  a prediction study with machine lea...Ifkad id 234 brazil's university ranking  a prediction study with machine lea...
Ifkad id 234 brazil's university ranking a prediction study with machine lea...
 
Using student data to transform teaching and learning
Using student data to transform teaching and learningUsing student data to transform teaching and learning
Using student data to transform teaching and learning
 
Using Learning analytics to support learners and teachers at the Open University
Using Learning analytics to support learners and teachers at the Open UniversityUsing Learning analytics to support learners and teachers at the Open University
Using Learning analytics to support learners and teachers at the Open University
 
nirf_booklet_FINAL_02_04_16_01-00PM
nirf_booklet_FINAL_02_04_16_01-00PMnirf_booklet_FINAL_02_04_16_01-00PM
nirf_booklet_FINAL_02_04_16_01-00PM
 
Secondary Analysis Of Qualitative Data
Secondary Analysis Of Qualitative DataSecondary Analysis Of Qualitative Data
Secondary Analysis Of Qualitative Data
 
Prospect for learning analytics to achieve adaptive learning model
Prospect for learning analytics to achieve adaptive learning modelProspect for learning analytics to achieve adaptive learning model
Prospect for learning analytics to achieve adaptive learning model
 
The impacts of scientific research fund (SRF) on improving research building ...
The impacts of scientific research fund (SRF) on improving research building ...The impacts of scientific research fund (SRF) on improving research building ...
The impacts of scientific research fund (SRF) on improving research building ...
 
Exploring scholarship and scholarly activity in college-based Higher Education
Exploring scholarship and scholarly activity in college-based Higher EducationExploring scholarship and scholarly activity in college-based Higher Education
Exploring scholarship and scholarly activity in college-based Higher Education
 
Discovering Student Dropout Prediction through Deep Learning
Discovering Student Dropout Prediction through Deep LearningDiscovering Student Dropout Prediction through Deep Learning
Discovering Student Dropout Prediction through Deep Learning
 
MD8Assgn: A8: Course Project—Program Proposal
MD8Assgn: A8: Course Project—Program ProposalMD8Assgn: A8: Course Project—Program Proposal
MD8Assgn: A8: Course Project—Program Proposal
 
EDUC 8103-6: A3: Program Proposal, Section 1 Needs Assessment Plan
EDUC 8103-6: A3: Program Proposal, Section 1 Needs Assessment PlanEDUC 8103-6: A3: Program Proposal, Section 1 Needs Assessment Plan
EDUC 8103-6: A3: Program Proposal, Section 1 Needs Assessment Plan
 
20_05_08 «Learning Analytics en la Open University y en el Reino Unido».
20_05_08  «Learning Analytics en la Open University y en el Reino Unido».20_05_08  «Learning Analytics en la Open University y en el Reino Unido».
20_05_08 «Learning Analytics en la Open University y en el Reino Unido».
 
«Learning Analytics at the Open University and the UK»
 «Learning Analytics at the Open University and the UK» «Learning Analytics at the Open University and the UK»
«Learning Analytics at the Open University and the UK»
 
OER LEARNING DESIGN GUIDELINES FOR BRAZILIAN K-12 TEACHERS SUPPORTING THE DEV...
OER LEARNING DESIGN GUIDELINES FOR BRAZILIAN K-12 TEACHERS SUPPORTING THE DEV...OER LEARNING DESIGN GUIDELINES FOR BRAZILIAN K-12 TEACHERS SUPPORTING THE DEV...
OER LEARNING DESIGN GUIDELINES FOR BRAZILIAN K-12 TEACHERS SUPPORTING THE DEV...
 
A LEARNING ANALYTICS APPROACH FOR STUDENT PERFORMANCE ASSESSMENT
A LEARNING ANALYTICS APPROACH FOR STUDENT PERFORMANCE ASSESSMENTA LEARNING ANALYTICS APPROACH FOR STUDENT PERFORMANCE ASSESSMENT
A LEARNING ANALYTICS APPROACH FOR STUDENT PERFORMANCE ASSESSMENT
 
A LEARNING ANALYTICS APPROACH FOR STUDENT PERFORMANCE ASSESSMENT
A LEARNING ANALYTICS APPROACH FOR STUDENT PERFORMANCE ASSESSMENTA LEARNING ANALYTICS APPROACH FOR STUDENT PERFORMANCE ASSESSMENT
A LEARNING ANALYTICS APPROACH FOR STUDENT PERFORMANCE ASSESSMENT
 
Mapping knowledge produced on problem based learning between 1945 and 2014 - ...
Mapping knowledge produced on problem based learning between 1945 and 2014 - ...Mapping knowledge produced on problem based learning between 1945 and 2014 - ...
Mapping knowledge produced on problem based learning between 1945 and 2014 - ...
 
Assessing Students' Information Literacy Skills Using MAP-Works
Assessing Students' Information Literacy Skills Using MAP-WorksAssessing Students' Information Literacy Skills Using MAP-Works
Assessing Students' Information Literacy Skills Using MAP-Works
 
Smartphone, PLC Control, Bluetooth, Android, Arduino.
Smartphone, PLC Control, Bluetooth, Android, Arduino. Smartphone, PLC Control, Bluetooth, Android, Arduino.
Smartphone, PLC Control, Bluetooth, Android, Arduino.
 
Running Head DESCRIPTIVE STATISTICS COMPUTING .docx
Running Head DESCRIPTIVE STATISTICS COMPUTING                    .docxRunning Head DESCRIPTIVE STATISTICS COMPUTING                    .docx
Running Head DESCRIPTIVE STATISTICS COMPUTING .docx
 

Mehr von IFSC

Trabalhos científicos (conceitos inicais e dicas) by cleverson tabajara vianna
Trabalhos científicos (conceitos inicais e dicas) by cleverson tabajara viannaTrabalhos científicos (conceitos inicais e dicas) by cleverson tabajara vianna
Trabalhos científicos (conceitos inicais e dicas) by cleverson tabajara viannaIFSC
 
A consultant practical knowledge approach of the higher education internation...
A consultant practical knowledge approach of the higher education internation...A consultant practical knowledge approach of the higher education internation...
A consultant practical knowledge approach of the higher education internation...IFSC
 
Models & Frameworks By Cleverson Tabajara Vianna
Models & Frameworks  By Cleverson Tabajara ViannaModels & Frameworks  By Cleverson Tabajara Vianna
Models & Frameworks By Cleverson Tabajara ViannaIFSC
 
Models & frameworks BY Cleverson Tabajara Vianna
Models & frameworks BY Cleverson Tabajara ViannaModels & frameworks BY Cleverson Tabajara Vianna
Models & frameworks BY Cleverson Tabajara ViannaIFSC
 
Ifkad 2018 a consultant practical knowledge approach of the higher educati...
Ifkad 2018    a consultant practical knowledge approach of the higher educati...Ifkad 2018    a consultant practical knowledge approach of the higher educati...
Ifkad 2018 a consultant practical knowledge approach of the higher educati...IFSC
 
METODOLOGIA CIENTÍFICA - Aspetos principais da Revisão da Literatura - Defini...
METODOLOGIA CIENTÍFICA - Aspetos principais da Revisão da Literatura - Defini...METODOLOGIA CIENTÍFICA - Aspetos principais da Revisão da Literatura - Defini...
METODOLOGIA CIENTÍFICA - Aspetos principais da Revisão da Literatura - Defini...IFSC
 
2018 Cleverson Tabajara - aspectos relevantes a serem descritos em modelos p...
2018 Cleverson Tabajara -  aspectos relevantes a serem descritos em modelos p...2018 Cleverson Tabajara -  aspectos relevantes a serem descritos em modelos p...
2018 Cleverson Tabajara - aspectos relevantes a serem descritos em modelos p...IFSC
 
My quadro de indicadores economico financeiros professor cleverson tabajara -...
My quadro de indicadores economico financeiros professor cleverson tabajara -...My quadro de indicadores economico financeiros professor cleverson tabajara -...
My quadro de indicadores economico financeiros professor cleverson tabajara -...IFSC
 
A origem do stakeholder
A origem do stakeholderA origem do stakeholder
A origem do stakeholderIFSC
 
METODOLOGIA CIENTÍFICA - Guia Simplificado para a Classificação de Pesquisas ...
METODOLOGIA CIENTÍFICA - Guia Simplificado para a Classificação de Pesquisas ...METODOLOGIA CIENTÍFICA - Guia Simplificado para a Classificação de Pesquisas ...
METODOLOGIA CIENTÍFICA - Guia Simplificado para a Classificação de Pesquisas ...IFSC
 
Gestão social x Questão Social
Gestão social x Questão SocialGestão social x Questão Social
Gestão social x Questão SocialIFSC
 
Word - Microsoft - Como inserir seções e numeração de página diferente
Word - Microsoft - Como inserir seções e numeração de página diferenteWord - Microsoft - Como inserir seções e numeração de página diferente
Word - Microsoft - Como inserir seções e numeração de página diferenteIFSC
 
My diagrama aspectos da responsabilidade social-v01
My diagrama   aspectos da responsabilidade social-v01My diagrama   aspectos da responsabilidade social-v01
My diagrama aspectos da responsabilidade social-v01IFSC
 
Nosso iceberg está derretendo v04
Nosso iceberg está derretendo v04Nosso iceberg está derretendo v04
Nosso iceberg está derretendo v04IFSC
 
Novos paradigmas gestao publica - Prof. Cleverson Tabajara
Novos paradigmas gestao publica - Prof. Cleverson TabajaraNovos paradigmas gestao publica - Prof. Cleverson Tabajara
Novos paradigmas gestao publica - Prof. Cleverson TabajaraIFSC
 
Nossa realidade é triste e pouquíssimo há que ufanar
Nossa realidade é triste e pouquíssimo há que ufanarNossa realidade é triste e pouquíssimo há que ufanar
Nossa realidade é triste e pouquíssimo há que ufanarIFSC
 
Prof. Tabajara - Indicadores de qualidade de vida-v01
Prof. Tabajara - Indicadores de qualidade de vida-v01Prof. Tabajara - Indicadores de qualidade de vida-v01
Prof. Tabajara - Indicadores de qualidade de vida-v01IFSC
 
Ética e gestão pública pontos de inflexão - publicar-v02
Ética e gestão pública  pontos de inflexão - publicar-v02Ética e gestão pública  pontos de inflexão - publicar-v02
Ética e gestão pública pontos de inflexão - publicar-v02IFSC
 
My markup - marcação de preços-v04
My   markup - marcação de preços-v04My   markup - marcação de preços-v04
My markup - marcação de preços-v04IFSC
 
My trabalhos científicos – dicas úteis
My trabalhos científicos – dicas úteisMy trabalhos científicos – dicas úteis
My trabalhos científicos – dicas úteisIFSC
 

Mehr von IFSC (20)

Trabalhos científicos (conceitos inicais e dicas) by cleverson tabajara vianna
Trabalhos científicos (conceitos inicais e dicas) by cleverson tabajara viannaTrabalhos científicos (conceitos inicais e dicas) by cleverson tabajara vianna
Trabalhos científicos (conceitos inicais e dicas) by cleverson tabajara vianna
 
A consultant practical knowledge approach of the higher education internation...
A consultant practical knowledge approach of the higher education internation...A consultant practical knowledge approach of the higher education internation...
A consultant practical knowledge approach of the higher education internation...
 
Models & Frameworks By Cleverson Tabajara Vianna
Models & Frameworks  By Cleverson Tabajara ViannaModels & Frameworks  By Cleverson Tabajara Vianna
Models & Frameworks By Cleverson Tabajara Vianna
 
Models & frameworks BY Cleverson Tabajara Vianna
Models & frameworks BY Cleverson Tabajara ViannaModels & frameworks BY Cleverson Tabajara Vianna
Models & frameworks BY Cleverson Tabajara Vianna
 
Ifkad 2018 a consultant practical knowledge approach of the higher educati...
Ifkad 2018    a consultant practical knowledge approach of the higher educati...Ifkad 2018    a consultant practical knowledge approach of the higher educati...
Ifkad 2018 a consultant practical knowledge approach of the higher educati...
 
METODOLOGIA CIENTÍFICA - Aspetos principais da Revisão da Literatura - Defini...
METODOLOGIA CIENTÍFICA - Aspetos principais da Revisão da Literatura - Defini...METODOLOGIA CIENTÍFICA - Aspetos principais da Revisão da Literatura - Defini...
METODOLOGIA CIENTÍFICA - Aspetos principais da Revisão da Literatura - Defini...
 
2018 Cleverson Tabajara - aspectos relevantes a serem descritos em modelos p...
2018 Cleverson Tabajara -  aspectos relevantes a serem descritos em modelos p...2018 Cleverson Tabajara -  aspectos relevantes a serem descritos em modelos p...
2018 Cleverson Tabajara - aspectos relevantes a serem descritos em modelos p...
 
My quadro de indicadores economico financeiros professor cleverson tabajara -...
My quadro de indicadores economico financeiros professor cleverson tabajara -...My quadro de indicadores economico financeiros professor cleverson tabajara -...
My quadro de indicadores economico financeiros professor cleverson tabajara -...
 
A origem do stakeholder
A origem do stakeholderA origem do stakeholder
A origem do stakeholder
 
METODOLOGIA CIENTÍFICA - Guia Simplificado para a Classificação de Pesquisas ...
METODOLOGIA CIENTÍFICA - Guia Simplificado para a Classificação de Pesquisas ...METODOLOGIA CIENTÍFICA - Guia Simplificado para a Classificação de Pesquisas ...
METODOLOGIA CIENTÍFICA - Guia Simplificado para a Classificação de Pesquisas ...
 
Gestão social x Questão Social
Gestão social x Questão SocialGestão social x Questão Social
Gestão social x Questão Social
 
Word - Microsoft - Como inserir seções e numeração de página diferente
Word - Microsoft - Como inserir seções e numeração de página diferenteWord - Microsoft - Como inserir seções e numeração de página diferente
Word - Microsoft - Como inserir seções e numeração de página diferente
 
My diagrama aspectos da responsabilidade social-v01
My diagrama   aspectos da responsabilidade social-v01My diagrama   aspectos da responsabilidade social-v01
My diagrama aspectos da responsabilidade social-v01
 
Nosso iceberg está derretendo v04
Nosso iceberg está derretendo v04Nosso iceberg está derretendo v04
Nosso iceberg está derretendo v04
 
Novos paradigmas gestao publica - Prof. Cleverson Tabajara
Novos paradigmas gestao publica - Prof. Cleverson TabajaraNovos paradigmas gestao publica - Prof. Cleverson Tabajara
Novos paradigmas gestao publica - Prof. Cleverson Tabajara
 
Nossa realidade é triste e pouquíssimo há que ufanar
Nossa realidade é triste e pouquíssimo há que ufanarNossa realidade é triste e pouquíssimo há que ufanar
Nossa realidade é triste e pouquíssimo há que ufanar
 
Prof. Tabajara - Indicadores de qualidade de vida-v01
Prof. Tabajara - Indicadores de qualidade de vida-v01Prof. Tabajara - Indicadores de qualidade de vida-v01
Prof. Tabajara - Indicadores de qualidade de vida-v01
 
Ética e gestão pública pontos de inflexão - publicar-v02
Ética e gestão pública  pontos de inflexão - publicar-v02Ética e gestão pública  pontos de inflexão - publicar-v02
Ética e gestão pública pontos de inflexão - publicar-v02
 
My markup - marcação de preços-v04
My   markup - marcação de preços-v04My   markup - marcação de preços-v04
My markup - marcação de preços-v04
 
My trabalhos científicos – dicas úteis
My trabalhos científicos – dicas úteisMy trabalhos científicos – dicas úteis
My trabalhos científicos – dicas úteis
 

Kürzlich hochgeladen

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Kürzlich hochgeladen (20)

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 

Brazil's university ranking a prediction study with machine learning 234 ifkad2018

  • 1. Brazil's University Ranking: a Prediction Study with Machine Learning Sérgio Nicolau da Silva Departamento Sistemas Instituto Federal de Educação, Ciência e Tecnologia de Santa Catarina - IFSC Rua 15 de Julho, 150 - Coqueiros. Florianópolis, SC. Brazil. CEP 88070-010 Cleverson Tabajara Vianna * Departamento de Saúde e Serviço Instituto Federal de Educação, Ciência e Tecnologia de Santa Catarina - IFSC Av. Mauro Ramos, 950 - Centro. Florianópolis, SC. Brazil. CEP 88020- 300. Fernando Alvaro Ostuni Gauthier Departamento de Engenharia e Gestão do Conhecimento Universidade Federal de Santa Catarina - UFSC Campus Reitor João David Ferreira Lima, s/n - Trindade, Florianópolis - SC. Brazil, CEP 88040-900 Antônio Pereira Cândido Departamento de Saúde e Serviço Instituto Federal de Educação, Ciência e Tecnologia de Santa Catarina - IFSC Av. Mauro Ramos, 950 - Centro. Florianópolis, SC. Brazil. CEP 88020- 300 * Corresponding author Structured Abstract How to distinguish the best or worst institutions of higher education? This is a question that permeates the minds and hearts of parents, students, and teachers because education is an investment in the personal and nation's future. As a source of information for the response to asking, the University Ranking of Folha - RUF appears. Known for its traditional evaluation, the Folha's Ranking is considered an independent evaluation tool and provides a ranking of the best Brazilian universities. 74% of the data are related to research areas and postgraduate programs. Who regulates and supervises the postgraduate 609 Proceedings IFKAD 2018 Delft, Netherlands, 4-6 July 2018 ISBN 978-88-96687-11-6 ISSN 2280787X
  • 2. programs in Brazil is CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior), authorizing or not the program, assigning a score from 1 to 7, with 7 being the best score. Your data for this evaluation is published. In this article, are using machine learning techniques based on Naïve Bayes algorithms. CAPES data and the Folha's Ranking of previous years are used as the training mass for the machine Naïve Bayes algorithm. After the training, CAPES data from 2015 was applied to predict the 2016 Ranking with a hit rate of 61.5%. A percentage above 60 of the Folha's Ranking shows that it is possible, with a more detailed study and analysis of the techniques, to predict with a certain confidence. It should be noted that according to the Folha's Ranking roles, the Scientific Research (mostly postgraduate) corresponds to a weight of 42% in the ranking. Purpose – The use of Machine Learning techniques to predict the Ranking Universitário da Folha (RUF), using previous year's history to train the Naïve Bayes algorithm Design/methodology/approach – Applied research, descriptive, exploratory objective and qualitative and quantitative abortion. Data were extracted from the RUF, CAPES, homogenized, and engineering methods were applied using several tools (WEKA, KDD, Data Mining, Postgres, ETL Pentaho) Originality/value – Use of machine learning techniques to gauge/predict the quality of Higher Education, an index that is inserted in a complex and interdisciplinary context. Practical implications – Proposes a statistical-based model to determine the quality of an educational institution Keywords – Brazil, University, Ranking, Naïve Baye, Machine Learning. Paper Type: Academic Research Paper 1 Introduction Quality of Higher Education, assessment, the ranking of the best Universities are topics to be tackled in this article, however, even given the relevance of the theme, the central point explored, is the use of mining algorithms to predict this ranking. However, it is mandatory to contextualize the emergence and importance of these rankings. The preliminary topic, which highlights the relevance of the theme, refers to the quality of Higher Education. Quality of education of Brazilian Universities has become a central theme for the country. In the educational area, this term is not consolidated and is not a standard ground. But for all practical purposes, the lack of understanding of the concept is not a problem. Moreover, the idea of quality is not even put into the focus of discussion. Together with the quality theme, questions about guarantee quality and accreditation arise. (Sobrinho, 2008) 610 Proceedings IFKAD 2018 Delft, Netherlands, 4-6 July 2018 ISBN 978-88-96687-11-6 ISSN 2280787X
  • 3. Since the 90's, most of the Latin American countries have set up their bodies for the evaluation of education quality of universities (Sobrinho, 2006). In Brazil, the accreditation, which in Brazil ultimately means "operating authorization", is a governmental assignment regulated by the Sistema Nacional de Avaliação da Educação Superior - SINAES - National System for the Evaluation of Higher Education (Southern and Vessuri, 2006; Rish, 2001). Since all Universities in Brazil must have an accreditation, or government authorization to act, how to distinguish the best or the worst? This is a question that pervades the minds and hearts of parents, students, and teachers, as education is an investment in the personal and the future of the nation. There are some resources available, but being governmental, have the same origin and do not evidence an independence of evaluation as having the origin in the own society. Precisely in this "information vacuum", the Ranking Universitário da Folha (Folha's University Ranking) - RUF appears. Known for its traditional evaluation, the Folha's Ranking is considered as an independent evaluation tool and provides a ranking of the best Brazilian universities. The RUF is developed under the responsibility of Folha de São Paulo (started in 1921), and use several mechanisms, aiming to rank the 195 best universities in the country, public or private. Its execution is in charge of DATAFOLHA. According to Folha de São Paulo (2016), in its own website we have: The RUF evaluates the 195 Brazilian universities based on 5 indicators: Scientific research; Quality of Teaching; Internationalization; Labour market; Innovation. Data are obtained from a variety of sources, including two annual surveys, encompassing thousands of respondents, and data are collected from such sources as: a. Inep-MEC b. Web of Science c. SciELO d. Inpi e. FAPs f. CNPq g. Capes h. Two Datafolha surveys done annually 611 Proceedings IFKAD 2018 Delft, Netherlands, 4-6 July 2018 ISBN 978-88-96687-11-6 ISSN 2280787X
  • 4. The question that motivated us to this research is: With what degree of certainty, by analysing only the data provided by CAPES, concerning the data of graduate program of Universities, can we predict whether or not a university will be in the RUF? To do so, we use Knowledge Engineering to establish this ranking. We use tools and techniques of Data Mining, Classification, Machine Learning and Recommendation and Prediction and Probability Algorithms. 2 Theoretical Construction Research in universities is usually associated with research groups, led by, in most, Ph.Ds. Thus, it is plausible to hypothesize that the influence of the structure and functioning of postgraduate programs is high in the RUF, even more than "research and teaching quality" are relevant parts of the RUF, as presented in analyzing the construction of the RUF and its structure. In this section, we look at how the RUF is built and the Data Mining tools that will support the experiment. We briefly describe what the RUF is and how it is composed. 2.1 Structure of the RUF and the open data of CAPES When analyzing the structure of formation of the RUF, we have that 74% of the data turn directly to Scientific Research (42%) and Quality of Education (32%), with the other topics Labour Market, Internationalization, and Innovation, if sum together represents the remaining 26%. In view of this, with a predominance of data related to research and as research is generally attributed to the postgraduate program, we came to the perception that although the ranking is aimed at undergraduate, the data of the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (Coordination of Improvement of Level Personnel) - CAPES, could have important weight for the ranking. For Gonçalves (2006) several basic approaches to statistics are proposed for machine prediction and learning, which use clustering algorithms to establish patterns: k-means and Bayesians are examples. Bayes was an 18th century English philosopher who expounded his theory of probability in 1763. The rule that bears his name has been a cornerstone of probability theory ever since. The difficulty with applying Bayes rule in practice is the attribution of prior probabilities (Witten and Frank, 2011). In this research, the Naive Bayes algorithm was used, with the Supervised Learning approach, which is based on probabilistic methods (Fulmari and Chandak, 2014). The CAPES open data of 2014 and RUF 2015 were used as the training and the prediction. Based on this, of RUF-2016 was predicted using CAPES's open data of 2015. The RUF prediction (by the algorithm) for the 2016 year was then compared with the results 612 Proceedings IFKAD 2018 Delft, Netherlands, 4-6 July 2018 ISBN 978-88-96687-11-6 ISSN 2280787X
  • 5. published by the RUF. Also through the algorithm J48, it was tried to establish a decision tree, but that due to the high number of branches is not feasible and with the "pruning", becomes insignificant. As a tool, we use WEKA. 2.2 Discovery of Knowledge Knowledge discovery is the process applied to structured, semi-structured and unstructured data, with the purpose of verifying the hypothesis of users or the discovery of new patterns. It can be further subdivided into two other objectives: the prediction of future behavior based on the analysis of historical data and the presentation of patterns identified in the data analysis (Fayyad and PIatetskY-Shapiro and SMYTH, 1996). Knowledge Discovery in Database (KDD) is the area that has mechanisms and techniques for structured data analysis. KDD can be seen as a multidisciplinary activity, as it encompasses techniques and beyond the scope of the discipline, such as machine learning (Fayyad, Piatetsky-Shapiro and Smyth, 1996). As part of KDD, Data Mining acts on extracting useful database information. 2.2 Data mining Nowadays, world the volume of digital data stored in electronic repositories grows at a fast pace, making a major migration of software companies to act on big data technologies and open data, according to studies published in 2015 from IDC: Worldwide Technology Big Data and Forecast Services, 2015-2019 (IDC # 259532) and the Worldwide Big Data Forecast by Vertical Market, 2014-2019 (IDC # US40544915). The term Data Mining has been used by statisticians, data analysts and communities of information systems in the management area and most popularly directly related to database (Fayyad, Piatetsky-Shapiro and SMYTH, 1996). Such a data mining process is supported by techniques that act in the training and testing based on historical data, thus recognizing patterns. This method is characteristic of machine-learning techniques for the recognition of patterns such as classification, clustering, clustering, among others (Fayyad, PIatetsky-Shapiro and Smyth, 1996). Each of the techniques has a variety of possible algorithms available and their variations. For the purpose of this article will be approached the classification by means of the Naïve-Bayes algorithm. 2.3 Classification Classification is a process that we are constantly carrying out throughout history and in our daily. We classify the transportation facilities by air, land, and sea, people of legal age and minors legal age and the economic classes of the population are some examples. 613 Proceedings IFKAD 2018 Delft, Netherlands, 4-6 July 2018 ISBN 978-88-96687-11-6 ISSN 2280787X
  • 6. The classification process consists of examining the characteristics of a certain object to be classified and assigning it one or more classes (Linoff and Berry, 2011). When the age of a person is presented, for example, by applying the current majority rule, it is possible to classify the individual as a major or minor. In data mining, the objects to be sorted are usually represented by records in a database table or a file, in which a column that represents your class is added. The task of classification is characterized by a definition of distinct classes that are identified from a training set composed of pre-classified examples (Linoff and Berry, 2011). Classification alone is not enough in complex cases for automated decision making, but it is an excellent guideline for decision-making in intensive knowledge activities. Thus, in seeking to identify the risk of the client in fulfilling its obligations, the technique seeks to predict the future. For example, based on past experience, you can establish in a financial institution which risks/confidence to receive loans. "Any of the techniques used for classification and estimation can be adapted for use in prediction by using training examples where the value of the variable to be predicted is already known, along with historical data for those examples. The historical data is used to build a model that explains the current observed behavior. When this model is applied to current inputs, the result is a prediction of future behavior" (Linoff and Berry, 2011). In this direction, the classification for the case of study in question is applied. There are numerous algorithms for classification of information. ID3 and C4.5 are some examples of classification algorithms, which use a symbolic approach1 . Other algorithms like Naïve Bayes and K-Neighboards have a statistical2 approach and several implementations. The WEKA3 For the case study that foresees case-based prediction, Naïve Bayes algorithm was used. "Naïve Bayes is a popular technique for this application because it is very fast and quite accurate" (Witten, Frank and Hall, 2011). The Naïve Bayes algorithm is quite effective when applied in data sets and combined with selection procedures and eliminating redundancies. software is an example of software that implements several algorithms related to data mining and extraction of knowledge. Naive Bayes-based algorithms that calculate explicit probabilities for hypotheses are among the most practical approaches to certain types of learning problems. Research has shown that the Naive-Bayes classifier can overcome the performance of decision tree- based algorithms and even neural networks (Mitchell, 1997). 1 ranks based on decision trees as "if is de sunny day then will not rain" 2 verify the probability of an event occurring 3 available at http://www.cs.waikato.ac.nz/ml/weka/ 614 Proceedings IFKAD 2018 Delft, Netherlands, 4-6 July 2018 ISBN 978-88-96687-11-6 ISSN 2280787X
  • 7. 3 Methodology The methodological classification of this research characterizes it as applied since it produces immediate results, however, it is also basic to serve as the basis for other research (Marconi and Lakatos, 2010). As for the objectives, it is descriptive, insofar as it describes characteristics of a phenomenon and establishes relations between variables. In seeking to establish limits, and approaches for new research, delimiting an unknown area, it is also characterized as having an exploratory objective. It also presents an explanatory objective, since it "deepens the knowledge of reality because it explains the reason, the reason of things". (Gil, 2002). It has a qualitative approach, as researchers attribute meanings to the data; on the other hand, it is quantitative because it follows the statistical rigors, not only using samples but of the whole universe that involves Universities. We used bibliographic, documentary and experimental procedures (Gil, 2008). The research itself followed the following steps: 1. Get the data from the CAPES open data for the years 2014 and 2015, regarding students, teachers, and courses. These data were processed and prepared, composing a relational database. Next, the RUF data of 2015 and 2016 were obtained and were treated and loaded into relational database tables. 2. We then need to mine the data, preparing a correlation and conversion table, matching the University initials of both systems (CAPES and RUF). This was an exhaustive task that even presented 2 incompatibilities that were not solved and that are part of the general analysis of the data. 3. The CAPES data were then summarized, including Masters and Ph.D. courses of each University, number of teachers, and final students. The predictors were each of these summarized fields, and the decision obtained was whether or not it belonged to the RUF, thus making compatible RUF and CAPES data. 4. Following the concepts of machine learning, we use the data from 2014 as "test", training the machine. To do so, we use the WEKA4 5. Next, we submit the 2015 data, in order to establish the prediction, of which Universities would be in RUF 2016 and compare it with the actual result. Software, where we apply the Naïve Bayes algorithm. 6. These data were then compared, and a confusion matrix was established, indicating both false positives and negatives. The results are interesting because with only open data a significant result was obtained, not requiring surveys, interviews, and other data not open (such as quantity of publications, quotations, among others). 4 WEKA is an open source software, produced at the University of Waikatu (NZ) and is a collection of machine learning algorithms for data mining tasks. 615 Proceedings IFKAD 2018 Delft, Netherlands, 4-6 July 2018 ISBN 978-88-96687-11-6 ISSN 2280787X
  • 8. 4 The experiment: Open data CAPES and RUF The first step is to collect the raw data from both the RUF and the CAPEs. From the RUF were collected the data of the ranking of 2015 and 2016 of the site and generated a file in format CSV, that file contains all the data of RUF, adding the year of reference. Figure 1 - RUF as it is presented in Folha's website Next, the data were standardized, ie they were prepared so that they could be handled by the software tools. The CAPE’s open data is provided in CSV format, which is a more suitable format for processing the data relative to the RUF which is a web page. Because it is in CSV, the data process is simpler than that applied to the RUF. From the CAPE site of open data, the following files were downloaded for postgraduate programs for the years 2014 and 2015: courses, teachers, and students undergraduate. With the raw data, we import them into a PostgreSQL database to facilitate the process of data normalization and extraction in the format expected by WEKA software. Although in CSV standard, this does not mean that your data is sanitized5 . For this process of sanitization and import to the database, an Extract, Transform and Load (ETL) tool used in the KDD process, more specifically in the preprocessing phase of the data. The tool chosen is Pentaho's Data-integration6 (Figure 2). 5 process that standardizes the data, maintaining its validity 6 can be downloaded of the Pentaho Community in http://community.pentaho.com/projects/data-integration/ 616 Proceedings IFKAD 2018 Delft, Netherlands, 4-6 July 2018 ISBN 978-88-96687-11-6 ISSN 2280787X
  • 9. Source: Tools utilized Figure 2 - Example ETL applied to RUF (left) and WEKA interface with CSV file import (dir) With the Data-integration tool, all data of interest to the search was imported. Even though both sources of data deal with the same domain - universities - there are divergences between the initial of institutions between databases. Even with sanitized, approximately 50 universities that are part of the RUF were not located in the CAPE data. To minimize this difference, a manual analysis of the data was required. O was provided for the learning of the machine in the case under study, it is precisely the union between the CAPES data of 2014 and RUF 2015, it is called "training mass", later to move to the algorithm already "trained", the new CAPE data and the same classify and make the predictions based on the knowledge perceived in the training. The data used were from CAPES 2014, obtaining UF and university initials compatible with RUF. The training file was then imported into the WEKA, via a graphical interface, to apply the Naïve Bayes algorithm and precision analysis (Figure 2). Several analyses and tests were performed to identify a configuration with the best possible result. It is important to emphasize that this is an extremely important activity for the process of knowledge extraction and that it is linked to the required interdisciplinarity, where an expert in the subject contributes to these adjustments. Applying the algorithm to the training data, the result was a 78.95% success rate. The confusion matrix generated is as follows: a b <- Classified as 191 33 a = N 51 124 b = S With this level of precision, training data were exported via WEKA to the ARFF format. 617 Proceedings IFKAD 2018 Delft, Netherlands, 4-6 July 2018 ISBN 978-88-96687-11-6 ISSN 2280787X
  • 10. After the training, we then have to predict the RUF for 2015. For this, the ARFF file was generated with the CAPE data for classification by the algorithm learned by the machine with the data of 2015. Again the ARFF will contain the same columns, however, the decision will contain "?", indicating to the algorithm to predict: Using a terminal7 As result, WEKA presents the classification performed for each instance of the file to be classified. Figure 3 shows the partial result of WEKA processing. in the OS X operating system, the following command was executed to determine to WEKA to perform the classification based on the training data: java -cp weka.jar weka.classifiers.bayes.NaiveBayes -t training.arff -T sort.arff -p 3-8 –D. === Predictions on test data === inst# actual predicted error prediction (cursos_mestrado, cursos_doutorado, formados_mestrado, formados_doutorado, docentes_mestres, docentes_doutores) In st# 01 1 :? 1 :N 0. 996 (0,2,0,88,22,2) 02 1 :? 1 :N 0. 994 (0,2,0,169,28,0) 03 1 :? 1 :N 0. 988 (0,3,0,75,95,0) … … … … … … … … … … 23 1 :? 1 :N 0. 998 (1,0,0,0,40,0) Source: WEKA output Figure 3 - Data obtained from Weka The last predicted column presents the prediction of each entry (data in parentheses), thus informing that an institution with those characteristics of courses, teachers, and students tends to be part of the RUF ranking. The result of the prediction was normalized and imported into the PostgreSQL database. After that, RUF2016 was compared with the result of the predictions, reaching the following result: of the 195 institutions that make up the RUF 2016, 120 were predicted by the Naïve Bayes process with a 61.5% success. 7 also known as command line or shell, allows the user to ask the operating system to perform some actions such as listing files, creating directories, running an application, among others 618 Proceedings IFKAD 2018 Delft, Netherlands, 4-6 July 2018 ISBN 978-88-96687-11-6 ISSN 2280787X
  • 11. 5 Conclusions Emphasize the importance of objective criteria for institutional evaluation when we assert that objective criteria and procedures that prioritize quantitative and comparable aspects are required (Sobrinho and Ristoff, 2005). The publication of linked open data expands this comparison process, allowing human and non-human agents to process and analyze information. Berners-Lee (1989) suggests that the data are open, especially those that can be classified with 5 stars in the future. Classification Description Available on the web (whatever format) but with an open licence, to be Open Data Available as machine-readable structured data (e.g. excel instead of image scan of a table) as (2) plus non-proprietary format (e.g. CSV instead of excel) All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff All the above, plus: Link your data to other people’s data to provide context References: Berners-Lee (1989, 2006) The act of measuring although it is a part of the evaluation process of the society on Universities, can not be considered in isolation (Vianna, 2014): The evaluation will express the actions, attitudes, and values of both individuals and communities or the science itself; if possible it should contemplate its multiple dimensions and interrelationships. It will always produce effects over time, be they political or pedagogical. An important part of the evaluation refers to the tests applied, the questionnaires to be answered and the results obtained - this is what is called the technical part of the evaluation; therefore, measurement is part of the evaluation, but the evaluation is not exhausted in the measurement. This means that it is not enough to assign notes, weights, and concepts. A percentage beyond 60% of the RUF ranking shows that it is possible, with a more detailed study and analysis of the techniques, to predict with a certain degree of confidence. It should be noted that, according to the RUF, the Scientific Research (mostly postgraduate) corresponds to a 42% weight in the ranking. Another hypothesis is to make a cut, selecting the first 60 universities. Thus, an algorithm to predict the 40, 50 or 60 best Brazilian universities, based strictly on open CAPE data, may present a higher degree of confidence. It is also observed that there are positive reflexes (above 60%) of the CAPES processes on the quality management of the Postgraduate Programs of Universities, intrinsically linked to the quality of higher education. 619 Proceedings IFKAD 2018 Delft, Netherlands, 4-6 July 2018 ISBN 978-88-96687-11-6 ISSN 2280787X
  • 12. References Berners-Lee, T. (1989) Information management: A proposal. Witten, I. H. and Frank, E. (2011) Data Mining: Practical machine learning tools and techniques, ed Morgan Kaufmann Sobrinho, J. D. and Vessuro, H. (2008) Quality, evoluation: from sinaes to indexes, In Avaliação da Educação Superior Magazine. Fulmari, A. and Chandak, M. B. (2014) An approch for word sense disambiguation using modified naïve bayes classifier, In International Journal of Innovative Research in Computer and Communication Engineering. Vianna, C. T. (2014) Avaliação institucional e o desafio da cultura da autoavaliação e cpa, In conference's publications of regional seminar about institutional self-evaluation and evaluations committees Fayyad, U., Piatetsky-Shapiro, G. and Smyth P. (1996) From data mining to knowledge discovery in databases. AI magazine, Vol. 17, No. 3, p. 37 Fulmari, A. and Chandak, M. B. (2014) An approach for word sense disambiguation using modified naïve bayes classifier. International Journal of Innovative Research in Computer and Communication Engineering, Vol. 2 Gil, A. C. (2002) Como elaborar projetos de pesquisa. São Paulo, Vol. 5 Gil, A. C. (2008) Métodos e técnicas de pesquisa social. In: Métodos e técnicas de pesquisa social. Atlas Gonçalves, A. L. (2006) Um modelo de descoberta de conhecimento baseado na correlação de elementos textuais e expansão vetorial aplicado à engenharia e gestão do conhecimento. 196 f. Tese (Doutorado) — Tese (Doutorado em Engenharia de Produção)-Programa de Pós- Graduação em Engenharia de Produção, Universidade Federal de Santa Catarina, Florianópolis Linoff, G. S. and Berry M. J. (2011) Data mining techniques: for marketing, sales, and customer relationship management. John Wiley & Sons Marconi, M. d. A. and Lakatos, E. M. (2010) Fundamentos de metodologia científica. In: Fundamentos de metodologia científica. ed Atlas Mitchell, T. M. (1997) Machine learning. New York Rish I. (2001) An empirical study of the naive bayes classifier. In: IBM NEW YORK. IJCAI 2001 workshop on empirical methods in artificial intelligence. Vol. 3, No. 22, pp. 41–46. Sobrinho, J. D. (2006) Acreditación de la educación superior en américa latina y el caribe. In: TRES, J.; SANYK, B. C. (Ed.). La educación superior en el Mundo 2007. Acreditación para la garantía de la calidad: ¿Qué está en juego? Global University Network for Innovation Sobrinho, J. D. and Ristoff, D. I. (2005) Avaliação como instrumento da formação cidadã e do desenvolvimento da sociedade democrática: por uma ético-epistemologia da avaliação. Ristoff, Dilvo &amp; Almeida JR, Vicente (organizadores). Avaliação Participativa, Perspectivas e Debates, série Educação Superior em Debate, No. 1, pp. 15–38 Sobrinho, J. D. and Vessuri, H. (2006) Paradigmas e políticas de avaliação da educação superior. autonomia e heteronomia. Universidad e investigación científica: convergências y tensiones. Vessuri H, org. Buenos Aires: CLACSO, Consejo Latinoamericano de Ciencias Sociales, pp. 169–191 620 Proceedings IFKAD 2018 Delft, Netherlands, 4-6 July 2018 ISBN 978-88-96687-11-6 ISSN 2280787X