2. Objectives
1. Describe big data.
2. Assess knowledge discovery in data.
3. Explore data mining.
4. Compare data mining models.
3. Data Mining
• Iterative process
• Explores and models big data
• Identifies patterns
• Provides meaningful insights
4. Big Data
IBM (2013) describes big data in a way that is
easy to understand.
Every day, we create 2.5 quintillion bytes of data —
so much that 90% of the data in the world today has
been created in the last two years alone. This data
comes from everywhere: sensors used to gather
climate information, posts to social media sites,
digital pictures and videos, purchase transaction
records, and cell phone GPS signals to name a few.
This data is big data (p. 1).
5. Data Mining Focus
• Producing a solution that generates useful
forecasting through a four phase process:
– 1. Problem identification,
– 2. Exploration of the data,
– 3. Pattern discovery, and
– 4. Knowledge deployment, application to new
data to forecast or generate predictions.
6. Data Mining Facilitates
• Data exploration and resulting
knowledge discovery fosters
proactive, knowledge driven decision
making
7. Exploratory Data Analysis (EDA)
• Sometimes known as model building or
pattern identification
• Pattern discovery is a complex phase of data
mining
• Yields a highly predictive, consistent pattern
identifying model
8. Data Mining Known as KDD
• KDD is known as
–knowledge discovery and data mining
–knowledge discovery and data
–knowledge discovery in databases
9. KDD
• Term knowledge discovery is key
• Data mining looks at the data from different
vantage points, aspects and perspectives
• Brings new insights to the data set
11. KDD and Research
• Berger and Berger (2004)
–nurse researchers are positioned to
use data mining technologies to
transform the repositories of big data
into comprehensible knowledge that is
useful for guiding nursing practice and
facilitating interdisciplinary research.
13. Data Mining Concepts
• Bagging
• Boosting
• Data reduction
• Drill down
• EDA
• Feature selection
• Machine learning
• Meta-learning
• Predictive
• Stacking
14. Data Mining Techniques
• Neural networks
• Decision trees
– Chi square automatic interaction detection (CHAID)
• Rule induction
• Algorithm
• Nearest neighbor
• Text mining
• Online Analytic Processing (OLAP)
• Brushing
15. Data Mining Models
• CRISP-DM
– 6 steps: business understanding, data
understanding, data preparation, modeling,
evaluation and deployment
• Six Sigma
– DMAIC steps: define, measure, analyze, improve
and control.
• SEMMA
– sample, explore, modify, model, assess
16. Benefits of KDD
• Enhance business aspects
• Help to improve patient care
17. Ethics of Data Mining
• Dependent on the use of private health
information (PHI)
• Insure data is de-identified and
confidentiality maintained
• Follow changes and specific
requirements for compliance with HIPAA
laws
18. References
• Berger, A. M., & Berger, C. R. (2004). Data mining as a tool for research and
knowledge development in nursing. Comput Inform Nurs, 22(3), 123-131.
PubMed ID: 15520581
• DeGruy, K. B. (2000). Healthcare applications of knowledge discovery in
databases. J Healthc Inf Manag, 14(2), 59-69. PubMed ID: 11066649
• Fernández-Llatas, C., Garcia-Gomez, J. M., Vicente, J., Naranjo, J. C.,
Robles, M., Benedi, J. M., & Traver, V. (2011). Behaviour patterns detection
for persuasive design in Nursing Homes to help dementia patients. Conf
Proc IEEE Eng Med Biol Soc, 2011, 6413-6417. PubMed ID: 22255806
• Goodwin, L., Saville, J., Jasion, B., Turner, B., Prather, J., Dobousek, T., &
Egger, S. (1997). A collaborative international nursing informatics research
project: predicting ARDS risk in critically ill patients. Stud Health Technol
Inform, 46, 247-249. PubMed ID: 10175406
19. References
• Green, J., Paladugu, S., Shuyu, X., Stewart, B., Shyu, C.,
& Armer, J. (2013). Using temporal mining to examine
the development of lymphedema in breast cancer
survivors. Nurs Res, 62(2), 122-129. PubMed ID:
23458909
• IBM. (2013). Big data at the speed of business.
Retrieved from http://www-
01.ibm.com/software/data/bigdata/
• Lee, T., Lin K., Mills, M., & Kuo, Y. (2012). Factors
related to the prevention and management of pressure
ulcers. Comput Inform Nurs, 30(9), 489-495. PubMed
ID: 22584879
20. References
• Lee, T., Lin K., Mills, M., & Kuo, Y. (2012). Factors
related to the prevention and management of pressure
ulcers. Comput Inform Nurs, 30(9), 489-495. PubMed
ID: 22584879
• Lee, T., Liu, C., Kuo, Y., Mills, M., Fong, J., & Hung, C.
(2011). Application of data mining to the identification
of critical factors in patient falls using a web-based
reporting system. Int J Med Inform, 80(2), 141-150.
PubMed ID: 21115393
• Madigan, E. & Curet, O. (2006). A data mining approach
in home healthcare: outcomes and service use. BMC
Health Serv Res, 6, 18. PubMed ID: 16504115
21. References
• Manyika, J., Chu, M., Brown, B., Bughin, J.,
Dobbs, R., Roxburgh, C., & Byers, A. (2011).
McKinsey Global Institute: Big data: The next
frontier for innovation, competition, and
productivity. Retrieved from
http://www.mckinsey.com/insights/business_
technology/big_data_the_next_frontier_for_i
nnovation
22. References
• SAS. (n.d.). SAS enterprise miner. Retrieved from
http://www.sas.com/offices/europe/uk/technologies/analy
tics/datamining/miner/semma.html
• Tishgart, D. (2012). Why security matters for big data and
health care: Data integrity requires good data security.
Retrieved from http://soa.sys-con.com/node/2389698
• Trangenstein, P., Weiner, E., Gordon, J., & McNew, R.
(2007). Data mining results from an electronic clinical log
for nurse practitioner students. Stud Health Technol Inform,
2007; 129, 1387-1391. PubMed ID: 17911941
• Zupan, B. & Demsar, J. (2008). Open-source tools for data
mining. Clin Lab Med, 28(1), 37-54. PubMed ID: 18194717