Más contenido relacionado


Gurney · SlidesCarnival.pptx

  2. Introduction Today's Internet is an important place for exchanging data such as text, images, audio, and video, and for sharing information, preferably in digital form. Using the Internet leads to accessing a huge amount of data. The data may be unstructured data, structured data, and semi-structured data. So we store and process such a huge amount of data of enormous complexity [2]. Therefore, it leads to the use of highly efficient and advanced tools and techniques to analyze and process this data. Analyzing and processing data allows understanding of useful information and knowledge about data. The term “data mining” appeared in the 1990s [3]. So the investigation of knowledge in data is nothing but data mining [4]. Mining is important because it gives learning about the diverse directions of life in the data [5]. 2
  3. Introduction Data mining is the process of discovering meaningful correlations, patterns, and trends by transforming a large amount of data store into warehouses, using pattern recognition techniques as well as statistical and mathematical techniques [3]. We have a large amount of data available but no knowledge about it. So data mining lends a way to experience knowledge from data. Data mining refers to filtering, sorting, and categorizing data from larger data sets to reveal subtle patterns and relationships, which helps organizations identify and solve complex business problems through data analysis. Data mining software tools and techniques allow organizations to predict future market trends and make critical business decisions at critical times[6]. 3
  4. Collect literature in Domain & visit sites Tools Selection Determine Criteria for comparison METHODOLOGY
  5. “ The main objective of the research is to provide an overview of the 10 best data mining tools - whether open source, proprietary, data integration, ease of use, or the programming language used. The preference of the tools was chosen based on 10 sites as follows: 5 Background •SPICeworks[9] •Javapoint[8] •UPWORK[6] •Monkeylearn[10] •HEVO[7] •Software Testing Help[15] •SELECTHUB[11] •CAREERFOUNDRY[14] •IMAGINARY CLOUD[13] •GURU99[12]
  6. “ Ten data mining tools have been nominated based on the previous sites, and they are in the following order: 6 Background 6.Orange 7. Oracle Data Mining (ODB) 8. Rattle 9.Apach Machout 10.Teradata 1.RapidMiner 2.SAS Enterprise Mining 3. Knime 4.IBM SPSS Modeler 5. Weka
  7. Criteria for Selecting Data Mining Tools 7 Data integration Security Open source or proprietary programming language functions and methodologies Ease of use 1 2 3 4 5 6
  8. RapidMiner 1
  9. Rapid Miner is an open source data mining tool with seamless integration with both R and Python. This open source is written in Java and can be integrated with WEKA and R-tool. A data science software platform that provides an integrated environment for the various phases of data modeling including data preparation, data cleansing, exploratory data analysis, visualization, and more. The technologies that the software helps with are machine learning, deep learning, text mining, and predictive analytics. Easy-to-use tools and a graphical user interface take you through the modeling process. The tool can be used for a wide range of applications, including corporate and commercial applications, research, education and training, application development, and machine learning. It has a client/server model as its base 9
  10. SAS Enterprise Mining 2
  11. SAS stands for Statistical Analysis System. It is a product of the SAS institute that was created to manage analytics and data. SAS can extract and alter data, manage information from different sources, analyze statistics, and allow users to analyze big data and provide accurate insight for timely decision-making purposes. SAS has a highly scalable distributed memory processing architecture. It is suitable for data mining, optimization, and text mining purposes. Its data mining features include the ability to perform exploratory and preparatory analyzes of vital data, all while producing accurate reports or summaries of your findings. SAS Enterprise Mining is well suited for companies large and small that intend to implement fraud detection applications or applications that enhance targeted customer response rates through marketing campaigns. SAS Enterprise Miner has benefits that you may not get from open source data mining tools, such as secure cloud integration and code logging (which ensures that your code is clean and free of potentially expensive bugs). On the downside, its GUI is functional but a bit outdated, which for an enterprise tool might seem a bit below
  12. Knime 3
  13. KNIME (short for Konstanz Information Miner) is another open source data integration and data mining tool. It incorporates machine learning and data mining mechanisms. KNIME is used for a full range of data mining activities including classification, regression, and dimensionality reduction (simplification of complex data while retaining the meaningful properties of the original dataset). You can also apply other machine learning algorithms such as decision tree, logistic regression, and k-means clustering. Other useful functions of KNIME range from data cleaning to analysis and reporting, which means that it is much more than just a data mining tool. Finally, it also integrates with Python and R (as well as other coded packages) though KNIME is implemented in Java, it also integrates with Ruby, Python, and R. 15
  14. [3]
  15. IBM SPSS Modeler 4
  16. SPSS is one of the most popular statistical software platforms. IBM SPSS Modeler is known for its ability to better bridge the data mining process and visualize the processed data. The tool allows importing large amounts of data from many disparate sources to reveal hidden data patterns and trends. The basic version of the tool works with spreadsheets and relational databases, while text analytics features are available in the premium version. The tool helps organizations easily leverage data assets and applications. One of the advantages of proprietary software is its ability to meet the robust security and governance requirements of an enterprise at the enterprise level. The advanced capabilities of the program provide an extensive library of machine learning algorithms, statistical analysis (descriptive, regression, clustering, etc.), text analysis, integration with big data, and so on. Furthermore, SPPS allows the user to enhance SPSS Syntax with Python and R using specialized extensions. 18
  17. [4]
  18. Weka 5
  19. Also known as Waikato Environment is an open source machine learning software developed at the University of Waikato in New Zealand. It is best suited for data analysis and predictive modeling and contains a large set of algorithms for data mining. It is written in JavaScript. Weka has a graphical user interface that facilitates easy access to all of its features. It is written in the Java programming language. Weka supports major data mining tasks including data mining, processing, visualization, regression etc. It operates on the assumption that the data is available in the form of a flat file. Weka can provide access to SQL databases through a database connection and can process the data/results returned by the query. 21
  20. [5]
  21. Orange 6
  22. Orange is a free and open source data science toolkit for developing, testing and visualizing data mining workflows. , uses Python scripting and visual programming that features interactive data analysis and component-based compilation of data mining systems. Orange offers a broader range of features than most other Python-based machine learning and data mining tools. It is a program that has more than 15 years of development and active use. Orange also offers a visual programming platform with a GUI for interactive data visualization. It is a component-based software, with a wealth of pre-built machine learning algorithms and text extraction add-ons. 24
  23. [6]
  24. Oracle Data Mining (ODB) 7
  25. Oracle Data Mining is a component of Oracle Advanced Analytics that enables data analysts to build and implement predictive models. It has many data mining algorithms for tasks like classification, regression, deviation detection, prediction, and more. With Oracle Data Mining, you can create models that help you predict customer behavior, segment customer profiles, detect fraud, and determine the best prospects to target. Developers can use the Java API to integrate these models into business intelligence applications to help them discover new trends and patterns. This is software that is proprietary and supported by Oracle's technical team in helping your business build a robust enterprise-wide data mining infrastructure. 27
  26. [7]
  27. Apach Machout 8
  28. Apache Mahout is an open source platform for building scalable applications using machine learning. Its goal is to help data scientists or researchers implement their own algorithms. It is a project developed by the Apache Foundation that serves the primary purpose of creating machine learning algorithms. It mainly focuses on data aggregation, classification, and collaborative filtering. It is written in Java and includes Java libraries to perform arithmetic operations such as linear algebra and statistics. Mahout is constantly growing because the algorithms implemented inside Apache Mahout are constantly growing. Mahout has the following main features: Extensible Programming Environment, Pre-built Algorithms, Math Experimentation Environment, 30
  29. Rattle 9
  30. Ratte is a GUI based data mining tool that uses the R stats programming language. Rattle reveals the statistical power of R by providing great data mining functionality. Although Rattle has a comprehensive and sophisticated user interface, it has an inbuilt log code tab that generates duplicate code for any activity happening in the GUI The data set produced by Rattle can be viewed and edited. Rattle gives other facilities to review the code, use it for several purposes, and extend the code without any restrictions. 33
  31. [9]
  32. Teradata 10
  33. Teradata is an open, massively parallel processing platform for developing large-scale data warehousing applications. It is a suitable mining tool for organizations that rely on multi-cloud deployment setups. Such frameworks can easily access databases, data lakes, and even external SaaS applications for an enterprise. Moreover, with no-code deployment features, it becomes more manageable to develop and analyze business models to make informed decisions. Teradata is open for deployment on any public cloud platform such as AWS, Google, and Azure. Data miners can also deploy the tool on- premises or private cloud. 36
  34. Conclusion In this research, I have understood the need for data mining tools. In addition, I have explored the most popular and powerful data mining tools. Data mining needs to extract complex data from a variety of data sources such as databases, customer relationship management, and project management tools .as mentioned earlier, most data mining tools are based on two major programming languages: R and Python. Each of these languages provides a complete set of packages and libraries involved for data mining and data science in general. Despite the dominance of these programming languages, integrated statistical solutions (such as SAS and SPSS) are still heavily 38