Jane Howard

JANE HOWARD
HEALTH INFORMATION MANAGEMENT
THE FOCUS OF MY REVIEW IS:
Are big data analysis and data mining
substitutes for traditional data collection and analysis?
I chose this focus because I learned a little about this topic in this subject and I
wanted to learn more.
I chose this focus because I wanted to know whether health information
management principles would change as technology changed.

Search Strategy: Boolean logic; phrase searching and proximity operators; using truncators and wildcards.
Searches were limited to date, material type, English language, and full text. Only advanced search was used.
A record was kept of the searches. Citation linking was used in Google Scholar.
Databases searched: PubMed/Medline, Medline via OVIDSP, Embase, Scopus, Academic Search
Premier/EBSCO, Web of Science, Current Contents Connect, Cochran library CINAHL/EBSCO, Springer
Keywords used: EHR, data, data mining, hospital information system, information, knowledge, big data,
knowledge discovery, quality, quality data, research, health information technology, data management.
Total number of articles retrieved: 10
Number of articles shortlisted for review: 5

Rationale to select the top five papers:
1. They outlined situations in my work as a health information manager that
might be the source of challenges;
2. They provided a comprehensive background for understanding current
knowledge and highlight the significance of this new methodology;
3. They flowed well and were written by multiple authours;
4. They presented many perspectives to consider but kept to the significance
and impact of data quality on further use of this new technique;
5. I performed a text mining methodology on each paper to determine which
ones had the greatest quantified text or data for each entry.

Thorat, S., Kute, S. (2014) Medical Data Mining Life Cycle and its Role in Medical Domain
International Journal of Computer Science and Information Technologies, Vol. 5 (4),
5751-5755.
Thorat & Kute (2014) argue that the technique of data mining is useful in detecting data
patterns and trends with little user effort. This detection allows predictions of unknown values.
However, there still remains a significant challenge in data quality when utilising this
technique. At the onset of data collection, for example, at admission, a patient is asked
hundreds of questions. This is one of the processes for data collection that requires scrutiny for
quality of data input that will eventually be trolled in a mechanised data-mining process.
Traditionally, there have been missing information and incorrect entries in datasets without
utilising the new methodologies in data mining. This is still the case. Thorat & Kute (2014)
summarise by saying that “correct information can only be achieved, if quality data is available
for mining.” This is the theme that will be reiterated throughout this literature review.

Easton, J., Stephens, C., & Angelova, M. (2014). Risk factors and prediction of very short
term versus short/intermediate term post-stroke mortality: A data mining approach.
Computers In Biology And Medicine, 54, 199-210. doi:10.1016/j.compbiomed.2014.09.0038
Increased ease of access to a large amount of data has also increased the need for careful and
clear criticism of sources. Just because historical data has been previously collected electronically,
does not mean its quality is assured. In Easton et al.’s (2014) eagerness to relinquish traditional
statistical modelling and grasp the new technological approach of data mining to reveal patterns
in the large volume of stroke data, there is criticism of the fragmentation of existing data that is
keeping this new approach from maximum uptake in the medical world. However, fragmentation
is not the biggest problem. Cleansing, pre-processing, and the traditional preparation of data; that
is, managing the data into a form appropriate for further analysis and processing, is the key to
successful data mining. It is a process that involves different tasks and cannot be fully
mechanised. Due to the routine, tedious, and time consuming nature of pre-preparation, and the
fact that this paper was not to outline the best model for predicting stroke mortality, Easton et al.
(2014) may have determined this fact was not worth mentioning.

Bottles, K., Begoli, E., & Worley, B. (2014). Understanding the Pros and Cons of Big Data Analytics. Physician
Executive, 40(4), 6-12.
This article’s many experts agree that when you have big data and the proper mining tools, you can do things
differently. However, it is not stated what can be done differently. Bottles et al. (2014) states that in the advent of big data
analytic platforms and cloud computing, health executives are clambering to acquire this technology rather than dealing
with the complexities and expense of managing an in-house data warehouse. Also, there is a general consensus by these
experts that the new processes of data mining cannot replace hypothesis driven theory. Data mining enhances hypothesis
driven theory - provided deficiencies in big data predictive analysis are addressed as they have in traditional data
collections. An example would be inherent biases in data collection and interpretation.
The speculation that the larger the data set, the more likely that false correlations will appear is not without merit.
Therefore, as Bottles et al. (2014) argues, it would be logical to have more study in measurements that have already been
performed. “One cannot simply combine databases, crunch the numbers, and marvellously uncover useable correlations.”
Unlike Easton et al. (2014), Bottles et al. (2014) acknowledgement of human supervision of collection, and to a
lesser extent, “actionable correlations” epitomises the fact that, yes, machines are necessary, but humans add judgement
to an otherwise entirely logical process.

Kharat, A. T., Singh, A., Kulkarni, V. M., & Shah, D. (2014). Data mining in radiology. The Indian Journal of
Radiology & Imaging, 24(2), 97–102. doi:10.4103/0971-3026.134367.
Kharat et al. (2014) appreciates the effects of traditional data collection in examining the different data mining
methodologies: classes, clusters, associations, sequential patterns, classification, prediction and decision tree.
Interestingly, Kharat et al. (2014) proposes the simple solution of redesigning existing systems such as Digital
Information and Communications in Medicine (DICOM), PACS and RIS, by merely installing complex algorithms.
Additionally, from a radiology perspective, it is pleasingly to see emphasises of the need for a standard lexicon.
Another quality insight is Kharat et al. (2014) suggestion to structure the radiology report in such a way that
keywords are highlighted and then saved. This common sense approach should be standardised. Future searches
would be restricted to “only to that part of the report” rather than the whole text. This would reduce the complexity
of the software involved, saving time, curtailing costs. These suggestions lend itself to the traditional principle that
proper preparation will not only make the data more searchable, but will save time in the long run, balancing any
time taken in preparation to ensure any information systems can be made to use cleansed data.

Holzinger A, Dehmer M, Jurisica I. (2014) Knowledge Discovery and interactive Data Mining in Bioinformatics-State-of-
the-Art, future challenges and research directions. BMC Bioinformatics. 2014;15 Suppl 6:I1. Available from:
MEDLINE, Ipswich, MA. Accessed 31 May, 2015.
Although Holzinger et al. (2014) promotes proper visualisation techniques to enhance the end user’s understanding of
the data seen, as with Easton, this is not as important as ensuring data quality at the beginning of any data mining process.
Holziner et al. (2014) concurs that merging databases is a problem encountered in data mining - often called the
“Merge/Purge” problem.
Even though there are methods in existence to enhance the accuracy and thereby the usability of existing, many machine
learning algorithms struggle with high-dimensional data.
Of course, there is agreement with the many other authours referenced in this presentation, that data mining has the
potential to ensure the best health practices are followed as opposed to clinical trials, but only if human intervention is
valued in maintaining data quality to produce a quality analysis. This is due to the traditional quality problems of
incomplete medical data, with missing data values, value naming conventions that are inconsistent, and as always, the
detection and removal of duplicate data entries. These central goals of data quality, would not pose problems or challenges
for a data quality manager who has a basic understanding of human information-processing.

Just as clinicians have with previous technological advances, they should become familiar with the
possibilities and problems inherent to big data and the various health data mining techniques, and use that
knowledge to help ensure they are not blinded to the fact that technological advances do not mean the
process for ensuring quality data is enhanced. There still needs to be a human element at the beginning of
the process.
Health
Information
Manager→

The purpose of this literature review was to stimulate discussion
about not taking data quality for granted without the human element.
“Computers are incredibly fast, accurate, but stupid. Humans are
incredibly slow, inaccurate, but brilliant. Together they may be
powerful beyond imagination”. Albert Einstein

Jane Howard

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Jane Howard

Ähnlich wie Jane Howard (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Jane Howard