SlideShare a Scribd company logo
1 of 43
Data mining
approach to internal
fraud in a project-
based organization
Mirjana Pejić Bach, Ksenija Dumičić, Berislav
Žmuk, Tamara Ćurlin, Jovana Zoraja
Introduction
01 02 03
04 05 06
Table of contents
Literature
Review
Objectives
Methodology Results Conclusions
2
01
Introduction
Internal fraud in data mining
● Internal fraud  one of the crucial and increasingly serious
problems in numerous organizations
● Internal fraud control  widely used for forecasting, detecting and
preventing possible fraudulent behaviours conducted by
organizations’ employees1
● Data mining techniques  widely used for external fraud detection
and prevention
● Project-based organizations  prone to internal fraud because of
lower level of control (result of flatter organizational structure2),
poor management practices3, complex governance procedures4
4 Introduction
Developed case
study
Dataset
Project-
based
organization
Working-hour claims:
• Client
• Expert
• Job
characteristics
chi-square
automatic
interaction
detection (CHAID)
Analysed
by
link analysis
and
Data mining
models
Aim of the study:
5
Introduction
Contributions od the study:
Contributing to the area of internal fraud detection and
prevention in project-based organizations, while most of
the previous research has been oriented towards external
fraud prevention.
Providing practical contributions, since our research
results in the form of decision tree and association rules
could enable organizations for developing their own
solutions for automatic internal fraud-detection (e.g.
using SQL code).
1
2
6
Introduction
02
Literature
Review
Fraud is a severe problem for companies, both inside and
outside the organization, with potentially significant financial
losses and damage to the company's reputation5
Internal controls are critical for detecting and preventing
fraudulent behaviour, and weak compliance with internal
controls increases the risk of fraud6
Many organizations lack an efficient internal control system, and
there are recommendations for increasing their effectiveness,
but progress can be slow due to difficulties accessing data from
previous cases1
Internal controls and fraud
8 Literature Review
Data mining is a tool used to extract significant patterns from databases
for further investigation, interpretation, and understanding7. It can achieve
three fundamental goals: description, prediction, and prescription
Challenges in data mining include performance time, management
support, and selection and execution of algorithms. The choice of data
mining technique depends on the structure of the data and the desired
results8
The main concern is the actual usage of data mining results in decision-
making, which often depends on management willingness to support its
application
Data mining
9 Literature Review
Automated fraud detection systems based on data mining models
have gained popularity, particularly in financial institutions9
Data mining has been successful in detecting and preventing fraud
in various fields, including financial crime detection, intrusion and
spam detection, and fraud analysis10
Data mining techniques can be used to decrease the probability of
internal fraud and have become an important tool for intelligent
business analytics and decision support for internal fraud
prevention. Many organizations acknowledge data mining as one
of the main technologies relevant to internal fraud prevention10
Data mining for internal fraud detection
10 Literature Review
03
Objectives
Objectives
● Case study analysis was conducted on a large project-
based company with over 300 employees who provide
monthly work reports, including hours worked, client
characteristics, work complexity, and claimed amounts
● This information is used to fill the working-hours claim
12
Research goal: Develop a model
to prevent potential fraudulent
behaviour by identifying
characteristics of suspected
working-hour claims for in-depth
fraud analysis.
Objectives
04
Methodology
Suspect working-hours claims
are defined by the company
as those that meet at least
one of two criteria: late
submission or cancellation of
already claimed hours.
The company wants to
identify characteristics of
potential fraud claims
before they are
considered suspect.
The dataset in Table 1 includes 1194 working-
hours claims, with 24.62% being suspect for
fraud and 75.38% being non-suspect.
The variable "Suspect" is used to categorize
the claims as suspect (with a value of 1) or
non-suspect (with a value of 0).
14
Data
Variable suspect Count Percent (%)
Suspect (value 1) 294 24.62
Non-suspect (value 0) 900 75.38
Total 1194 100.00
Table 1. Suspect and non-suspect working-hour claims in the sample
Methodology
● Type of customer – variable Customer
● Type of consultant – variable Consultant
● The month when the working-hours were
claimed – variable Month
● The hourly-rate – variable UnitPriceCoded
● The consultant’s level of expertise – variable
ExpertLevel
● The number of hours claimed – variable
NoHoursCoded
● The total amount claimed – variable
TotalAmountCoded
The independent
variables in the
working-hour claims
are used for
developing data
mining models:
Data
15 Methodology
• Variable "Customer" in Table 2. is divided into three categories: governmental institutions,
internal projects, and private enterprises
• Internal projects suspected in 50.68% of cases
• Chi-square test confirms a difference in the structure of suspect claims across customer
categories (p-value < 0.001)
16
Data
Customer origin Suspect (%) Not suspect (%) Chi-square P-value
Govern 4.76 95.24 77.435 <0.001
Internal 50.68 49.32
Private 22.80 77.20
Totals 24.62 75.38
Table 2. Types of the customer – variable Customer
Methodology
• Consultant variable in Table 3 indicates the country of origin of experts claiming working
hours
• Domestic consultants had 23.46% of their claims suspected, while foreign consultants had
41.56% suspected
• Chi-square test shows that domestic and foreign consultants have significantly different
structures regarding suspected and non-suspected claims
17
Data
Consultant origin Suspect (%) Not suspect (%) Chi-square P-value
Domestic 23.46 76.54 12.719 <0.001
Foreign 41.56 58.44
Totals 24.62 75.38
Table 3. Types of the consultant – variable Consultant
Methodology
• Months M1 and M12 in Table 4
have the highest percentage of
suspected working-hours claims
(65.77% and 30.59% respectively),
likely related to the beginning and
end of the fiscal year
• Months M10 and M4 have the
highest percentage of non-
suspected working-hours claims
(88.42% and 86.40% respectively)
• The chi-square test confirms a
statistically significant difference
in suspected working-hours claims
across different months (chi-
square=134.670, df=11, p-
value<0.001)
18
Data
The month of
the claim
Suspect
(%)
Not suspect
(%)
Chi-
square
P-value
M1 65.77 34.23 134.670 <0.001
M2 28.13 71.88
M3 17.31 82.69
M4 13.60 86.40
M5 16.81 83.19
M6 20.39 79.61
M7 14.29 85.71
M8 28.09 71.91
M9 25.93 74.07
M10 11.58 88.42
M11 19.28 80.72
M12 30.59 69.41
Table 4. The month when the working-hours were claimed
– variable Month
Methodology
• UnitPriceCoded categorizes consultants' cost per hour into four groups in Table 5
• Highest share of suspected claims found in 151-200 EUR cost category (30.00%)
• Chi-square test shows no significant difference between suspected and non-suspected
claims across the four cost categories at a 5% significance level
19
Data
The hourly rate Suspect (%) Not suspect (%) Chi-square P-value
1-50 EUR/h 10.81 89.19 6.278 0.099
51-100 EUR/h 25.59 74.41
101-150 EUR/h 18.75 81.25
151-200 EUR/h 30.00 70.00
Totals 24.62 75.38
Table 5. The hourly-rate – variable UnitPriceCoded
Methodology
• Expert Level variable has five levels coded from L4 to L8, as shown in Table 6
• The highest share of suspected claims is present at the expert level L5
• The chi-square test confirms that there is a statistically significant difference between
expert levels regarding the shares of suspected and non-suspected working-hours claims
20
Data
Consultant Suspect (%) Not suspect (%) Chi-square P-value
L6 24.69 75.31 33.147 <0.001
L5 77.78 22.22
L8 30.00 70.00
L4 10.81 89.19
L7 18.75 81.25
Totals 24.62 75.38
Table 6. The consultant’s level of expertise – variable ExpertLevel
• Table 7 shows the number of
weekly working hours classified
into eight groups ranging from 1-5
to 36-55
• An additional category was added
to account for negative weekly
working hours, which are always
treated as suspected
• The Chi-square test showed that at
least one weekly working-hours
category has a statistically
significant difference in the share
of suspected working-hours claims
compared to other categories at
the significance level of 1%
21
Data
The number
of hours
Suspect
(%)
Not suspect
(%)
Chi-
square
P-value
Negative hours 100.00% 0.00% 53.859 <0.001
1-5 hours 22.63% 77.37%
6-10 hours 26.06% 73.94%
11-15 hours 26.90% 73.10%
16-20 hours 23.39% 76.61%
21-25 hours 17.65% 82.35%
25-30 hours 9.68% 90.32%
31-35 hours 23.08% 76.92%
36-55 hours 21.05% 78.95%
Totals 24.62% 75.38%
Table 7. The number of hours claimed – variable
NoHoursCoded
Methodology
• Variable
TotalAmountCoded
categorizes the costs
of consultants’
working-hours into 19
cost categories.
• Negative amount
claimed for working-
hour claims with
negative hours
• Chi-square test
showed at least one
total cost per
consultant category
with statistically
significantly different
shares of suspected
working-hours claims
22
Data
The number amount claimed Suspect (%) Not suspect (%) Chi-square P-value
1-100 EUR 25.30% 74.70% 80.068 <0.001
101-200 EUR 16.94% 83.06%
201-300 EUR 24.56% 75.44%
301-400 EUR 24.17% 75.83%
401-500 EUR 24.21% 75.79%
501-600 EUR 24.51% 75.49%
601-700 EUR 31.52% 68.48%
701-800 EUR 16.67% 83.33%
801-900 EUR 13.33% 86.67%
901-1000 EUR 42.42% 57.58%
1001-1100 EUR 35.19% 64.81%
1101-1300 EUR 27.87% 72.13%
1301-1500 EUR 29.33% 70.67%
1501-1600 EUR 5.56% 94.44%
1601-1700 EUR 15.38% 84.62%
1701-2000 EUR 22.22% 77.78%
2001-3000 EUR 18.97% 81.03%
3001-4000 EUR 13.79% 86.21%
Negative 100.00% 0.00%
Totals 24.62% 75.38%
Table 8. The total amount claimed – variable TotalAmountCoded
Methodology
CHAID decision tree
● Decision tree created with CHAID
algorithm was developed to
understand the relationship between
working hour claim fraud and various
characteristics
● CHAID algorithm uses the chi-square
test to select the best split at each
step
● Dependent variable is Suspect and
all other observed variables are
independent variables
● Decision tree depth goes up to the
third level11, with main node having
at least 100 cases and child nodes
having at least 50 cases
23 Methodology
Link analysis
Description
Formulation
24
By using link analysis, the association rules are
extracted in order to detect significant relationships
between suspect working-hours claims and various
characteristics of customers, consultants, and
projects. Association rules can be described as:
If A=1 and B=1 then C=1 with probability p
Link analysis is a data analysis technique used
to identify and evaluate relationships between
items represented as "nodes." It can be used for
the detection of potentially suspect working-
hours claims based on characteristics of
clients, consultants, and projects.
where A, B, and C are binary variables, p is a conditional
probability defined as p = p(C = 1|A = 1, B = 1)
Methodology
Link analysis
● All eight variables are analysed, including Suspect, Customer, Consultant,
Month, UnitPriceCoded, ExpertLevel, NoHoursCoded, and TotalAmountCoded
● The non-sequential association analysis approach is used, and Statistica Data
Miner software ver. 13.5 is employed for link analysis
● The minimum support value, which shows how frequently an itemset appears
in the dataset, has been set to value 0.2 whereas the maximum value was set
to 1.0. Support is calculated as:
"Support" (A ⇒ B) = p(A ∪ B)
25
Methodology
Link analysis
● Minimum support and confidence values were set to exclude items from the
analysis
● Confidence measures how often a rule is true and is calculated using an equation
● Additional, it has been defined that the maximum number of items in an item set is
10
● No strict rules exist for selecting minimum support value, minimum confidence
value, or maximum number of items in an item set12
● The limits used in this study were chosen because experiments showed they
resulted in interesting rules
● Other authors may use subjective criteria13,14
Confidence (A ⇒ B) = p(B│A) = Support(A, B) / Support (A)
26
Methodology
05
Results
Decision tree
• According to defined settings, the
CHAID decision tree is developed,
displayed on Figure 1
• The resulting CHAID decision tree
has 3 levels and overall 11 nodes out
of which seven are considered as a
terminal
• Figure 1 also reveals that variables
Month, Customer and ExpertLevel
had the highest level of statistical
significance and therefore they are
used in building the classification
tree
28 Results Figure 1. CHAID decision tree
01 02 03
Decision tree
The variable used for
branching on the first level is
the variable Month, which
turned out to be statistically
significant at the level of 1%,
resulting in three new nodes
with varying percentages of
suspected and non-
suspected working-hour
claims.
Branching on the second level
using the variable Customer
resulted in five new nodes, three
of which came out from Node 1
and two from Node 2, with highly
statistically significant chi-square
values at 1%, and the largest
node, Node 6, consisted of
customers of private enterprises
with 83.9% connected to non-
suspected working-hours claims
and 16.1% connected to
suspected working-hours claims.
29 Results
The third level branching variable
is the variable ExpertLevel, and
this branching process is
statistically significant at the 5%
level. Node 9 is considerably
larger than Node 10 and includes
431 or 85.5% non-suspected
working-hours claims and 73 or
14.5% suspected working-hours
claims.
Most important notes
• Table 9 shows the classification matrix comparing observed and predicted status of
working-hours claims
• The algorithm was correct in 93.3% of cases for non-suspect working-hour claims
• For suspected working-hour claims, the algorithm correctly classified only 36.1% of them out
of 294
Decision tree
Observed
classification
Non suspect Predicted classification
Suspect
Percent
correct
Non-suspect 840 60 93.3%
Suspect 188 106 36.1%
Overall percentage 86.1% 13.9% 79.2%
Table 9. The number of hours claimed – variable NoHoursCoded
30 Results
• Association rules were
developed using selected
metrics, as shown in Table 10
• The item Suspect alone
appeared in 27.71% of item sets
with a frequency of 235
• Suspected working-hour claims
are closely related to private
enterprises, domestic
consultants, cost per hour
between 51 and 100 EUR, and
expert level L6
Link analysis Table 10. Frequent item sets that contain Suspect item
31 Results
Frequent items sets Number
of items
Frequency Support (%)
Suspect 1 235 27.712
51-100, Suspect 2 225 26.533
51-100, L6, Suspect 3 221 26.061
L6, Suspect 2 221 26.061
Domestic, Suspect 2 220 25.943
51-100, Domestic, Suspect 3 210 24.764
51-100, Domestic, L6, Suspect 4 206 24.292
Domestic, L6, Suspect 3 206 24.292
Private, Suspect 2 193 22.759
Domestic, Private, Suspect 3 185 21.816
51-100, Private, Suspect 3 183 21.580
L6, Private, Suspect 3 180 21.226
51-100, L6, Private, Suspect 4 180 21.226
51-100, Domestic, Private, Suspect 4 175 20.636
51-100, Domestic, L6, Private, Suspect 5 172 20.283
Domestic, L6, Private, Suspect 4 171 20.283
Link analysis
32 Results
• Figure 2 is a Web graph showing
important nodes related to
domestic experts, non-suspected
claims, L6 expertise level, private
customers, and low hourly paid
rates (51-100 EUR)
• The strongest joint support is for
non-suspected claims and the
above-mentioned nodes
• The darkest line represents the
strong relationship between L6
expertise level and low hourly paid
rates (51-100 EUR)
Figure 2. Web graph of items generated by link analysis
Link analysis
33 Results
• Figure 3 shows a rule graph of
items generated by link analysis
• Node size represents the relative
support of each item, and colour
darkness represents relative
confidence
• The rule with the highest
confidence and support is the
relationship between the lowest
level of expertise (L6) and one of
the low levels of hourly paid rate
(51-100 EUR), and rules containing
the item Suspect are presented
with small node sizes
Figure 3. Rule graph of items generated by link analysis
Link analysis
34 Results
• Table 11 shows association
rules with the item Suspect
in the body
• The first rule indicates that
26.53% of working-hours
claims with cost per hour
between 51 and 100 EUR
are suspected
• The second and third rules
have the same support and
confidence levels
• Note: Body for each row
equales to Suspect
Table 11. Frequent association rules with the item Suspect in the body
Head Support (%) Confidence (%) Lift (%)
51-100 26.533 95.745 1.052
51-100, L6 26.061 94.043 1.042
L6 26.061 94.043 1.042
Domestic 25.943 93.617 0.985
51-100, Domestic 24.764 89.362 1.031
51-100, Domestic, L6 24.292 87.660 1.021
Domestic, L6 24.292 87.660 1.021
Private 22.759 82.128 0.992
Domestic, Private 21.816 78.723 0.995
51-100, Private 21.580 77.872 1.025
51-100, L6, Private 21.226 76.596 1.016
L6, Private 21.226 76.596 1.016
51-100, Domestic, Private 20.637 74.468 1.027
51-100, Domestic, L6, Private 20.283 73.191 1.017
Domestic, L6, Private 20.283 73.191 1.017
Link analysis
35 Results
• Table 12 shows association
rules with the item Suspect
and one more item in the
Body
• The strongest association
is achieved when Suspect
and Private are in the Body
with the item Domestic
• The strongest association
is achieved when Suspect
and L6 are in the Body with
the item 51-100
Table 12. Association rules with the item Suspect and one more item in the Body
Body Head Support (%) Confidence (%) Lift (%)
Private, Suspect Domestic 21.816 95.855 1.008
Private, Suspect 51-100 21.580 94.819 1.042
Private, Suspect 51-100, L6 21.226 93.264 1.034
Private, Suspect L6 21.226 93.264 1.034
Private, Suspect 51-100, Domestic 20.637 90.674 1.046
Private, Suspect 51-100, Domestic, L6 20.283 89.119 1.038
Private, Suspect Domestic, L6 20.283 89.119 1.038
L6, Suspect 51-100 26.061 100.000 1.098
L6, Suspect 51-100, Domestic 24.292 93.213 1.075
L6, Suspect Domestic 24.292 93.213 0.981
L6, Suspect 51-100, Private 21.226 81.448 1.072
L6, Suspect Private 21.226 81.448 0.984
L6, Suspect 51-100, Domestic, Private 20.283 77.828 1.073
L6, Suspect Domestic, Private 21.816 95.855 0.984
Link analysis
36 Results
• The rest of Table 12:
Table 12. Association rules with the item Suspect and one more item in the Body
Body Head Support (%) Confidence (%) Lift (%)
Domestic, Suspect 51-100 24.764 95.455 1.049
Domestic, Suspect 51-100, L6 24.292 93.636 1.038
Domestic, Suspect L6 24.292 93.636 1.038
Domestic, Suspect Private 21.816 84.091 1.016
Domestic, Suspect 51-100, Private 20.637 79.545 1.047
Domestic, Suspect 51-100, L6, Private 20.283 78.182 1.038
Domestic, Suspect L6, Private 20.283 78.182 1.038
51-100, Suspect L6 26.061 98.222 1.089
51-100, Suspect Domestic 24.764 93.333 0.982
51-100, Suspect Domestic, L6 24.292 91.556 1.066
51-100, Suspect Private 21.580 81.333 0.982
51-100, Suspect L6, Private 21.226 80.000 1.062
51-100, Suspect Domestic, Private 20.637 77.778 0.983
51-100, Suspect Domestic, L6, Private 20.283 76.444 1.063
Link analysis
37 Results
• Table 13 presents
association rules with
item Suspect and two or
more items in the Body
• Suspected working-hours
claims with customers
from private enterprises
and expert level L6 have
cost per hour between 51
and 100 EUR
• Similar conclusion
reached when items
Domestic, L6, Private, and
Suspect are associated
with item 51-100
Table 13. Association rules with the item Suspect and two or more items in the Body
Body Head Support (%) Confidence (%) Lift (%)
L6, Private, Suspect 51-100 21.226 100.000 1.098
L6, Private, Suspect 51-100, Domestic 20.283 95.556 1.102
L6, Private, Suspect Domestic 20.283 95.556 1.005
Domestic, Private,
Suspect
51-100 20.637 94.595 1.039
Domestic, Private,
Suspect
51-100, L6 20.283 92.973 1.031
Domestic, Private,
Suspect
L6 20.283 92.973 1.031
Domestic, L6,
Suspect
51-100 24.292 100.000 1.098
Domestic, L6,
Suspect
51-100, Private 20.283 83.495 1.099
Domestic, L6,
Suspect
Private 20.283 83.495 1.009
Domestic, L6,
Private, Suspect
51-100 20.283 100.000 1.098
51-100, Private,
Suspect
L6 21.226 98.361 1.090
Link analysis
38 Results
• The rest of Table 13:
Table 13. Association rules with the item Suspect and two or more items in the Body
Body Head Support (%) Confidence (%) Lift (%)
51-100, Private,
Suspect
Domestic 20.637 95.628 1.006
51-100, Private,
Suspect
Domestic, L6 20.283 93.989 1.095
51-100, L6, Suspect Domestic 24.292 93.213 0.981
51-100, L6, Suspect Private 21.226 81.448 0.984
51-100, L6, Suspect Domestic, Private 20.283 77.828 0.984
51-100, L6, Private,
Suspect
Domestic 20.283 95.556 1.005
51-100, Domestic,
Suspect
L6 24.292 98.095 1.087
51-100, Domestic,
Suspect
Private 20.637 83.333 1.007
51-100, Domestic,
Suspect
L6, Private 20.283 81.905 1.087
51-100, Domestic,
Private, Suspect
L6 20.283 98.286 1.089
51-100, Domestic,
L6, Suspect
Private 20.283 83.495 1.009
06
Conclusions
01 02 03
Conclusions
A case study analysis was
conducted on suspected working-
hour claims in a project-based
company
Two data mining models were
developed to identify
characteristics of fraudulent
working-hour claims
Results show that suspected claims are related
to private enterprise customers, domestic
consultants, low hourly rates, and lowest
expertise level.
04 05 06
40 Conclusions
The paper demonstrates the use
of data mining methodology to
detect internal fraud in project-
based organizations, focusing on
working-hour claims.
The research has significant practical implications
for organizations to expand usage and potentials
of different data mining techniques, which could
help them to be more effective and efficient in
investigating and preventing internal fraud.
However, future research is needed to test the
general applicability of the results.
The decision tree and link analysis
are recommended for use as a
supportive instrument for the
detection of suspect working-hour
claims, in combination with other
human-based and machine-based
methods.
Acknowledgements
This paper extends the research on the internal fraud using the CHAID
decision tree that was presented on CENTERIS - Conference on
ENTERprise Information Systems (Pejić Bach et al., 2018) [11]. This
research has been fully supported by the Croatian Science Foundation
under the PROSPER (Process and Business Intelligence for Business
Performance) project (IP-2014-09-3729).
References
1 ACFE (2018). Report to the Nations: 2018 Global Study on Occupational Fraud and Abuse [Online]. Available:
https://www.acfe.com/uploadedFiles/ACFE_Website/Content/rttn/2018/RTTN-Government-Edition.pdf
2 C. Smith, Making Sense of Project Realities: Theory, Practice and the Pursuit of Performance. Abingdon, UK: Routledge, 2017.
3 S. Bathallath, Å. Smedberg and H. Kjellin, “Managing project interdependencies in IT/IS project portfolios: A review of
managerial issues,” International Journal of Information Systems and Project Management, vol. 4, no. 1, pp. 67-82, 2016.
4 D. R. Chandra and J. van Hillegersberg, “Governance of inter-organizational systems: A longitudinal case study of Rotterdam’s
Port Community System,” International Journal of Information Systems and Project Management, vol. 6, no. 2, pp. 47-68, 2018.
5 P. Gottschalk, “Categories of financial crime,” Journal of Financial Crime, vol. 17, no. 4, pp. 441-458, 2010.
6 M. DeLiema, G. Mottola and M. Deevy. (2017, February). Findings from a pilot study to measure financial fraud in the United
States [Online]. Available: https://ssrn.com/abstract=2914560
7 J. Leskovec, A. Rajaraman and J. D. Ullman, Mining of Massive Data Sets. Cambridge, UK: Cambridge University Press, 2019.
8 C. K. S. Leung, “Big data analysis and mining,” in Advanced Methodologies and Technologies in Network Architecture, Mobile
Computing, and Data Analytics, D.B.A. Mehdi Khosrow-Pour, Ed. Pennsylvania, USA: IGI Global, 2019, pp. 15-27.
9 E. W. T. Ngai, Y. Hu, Y. H. Wong, Y. Chen and X. Sun, “The application of data mining techniques in financial fraud detection: A
classification framework and an academic review of literature,“ Decision Support Systems, vol. 50, no. 3, pp. 559-569, 2011.
10 M. J. Kranacher and R. Riley, Forensic Accounting and Fraud Examination. Hoboken, NJ: John Wiley & Sons, 2019.
11 D. Bertsimas and J. Dunn, “Optimal classification trees,” Machine Learning, vol. 106, no. 7, pp. 1039-1082, 2017.
12 P. Lenca, P. Meyer, B. Vaillant and S. Lallich, “On selecting interestingness measures for association rules: User oriented
description and multiple criteria decision aid,” European Journal of Operational Research, vol. 184, no. 2, pp. 610-626, 2008.
13 T. Brijs, G. Swinnen, K. Vanhoof and G. Wets, “Building an association rules framework to improve product assortment
decisions,” Data Mining and Knowledge Discovery, vol. 8, no. 1, pp. 7-23, 2004.
14 Y. Boztuğ and T. Reutterer, “A combined approach for segment-specific market basket analysis,” European Journal of
Operational Research, vol. 187, no. 1, pp. 294-312, 2008.
CREDITS: This presentation template was created by Slidesgo,
including icons by Flaticon and infographics & images by Freepik
Thank you for your
attention!
Do you have any questions?
Mirjana Pejić Bach
University of Zagreb, Faculty of Economics and Business
Trg J. F. Kennedyja 6, 10000 Zagreb
Croatia
http://www.shortbio.net/mpejic@efzg.hr

More Related Content

Similar to [DSC Adria 23] Mirjana Pejic Bach Data mining approach to internal fraud in a project-based organization.pptx

Credit risk scoring model final
Credit risk scoring model finalCredit risk scoring model final
Credit risk scoring model finalRitu Sarkar
 
Research Methods in Marketing
Research Methods in MarketingResearch Methods in Marketing
Research Methods in MarketingVartika Kundu
 
Partnerships in Clinical Trials 2014
Partnerships in Clinical Trials 2014Partnerships in Clinical Trials 2014
Partnerships in Clinical Trials 2014The Avoca Group
 
A literary study on the bonding of the Six Sigma with the Service Quality for...
A literary study on the bonding of the Six Sigma with the Service Quality for...A literary study on the bonding of the Six Sigma with the Service Quality for...
A literary study on the bonding of the Six Sigma with the Service Quality for...IJERA Editor
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSpartan60
 
Quantitative Techniques: Introduction
Quantitative Techniques: IntroductionQuantitative Techniques: Introduction
Quantitative Techniques: IntroductionDayanand Huded
 
Adoption of New Service Development Tools in the Financial Service Industry
Adoption of New Service Development Tools in the Financial Service IndustryAdoption of New Service Development Tools in the Financial Service Industry
Adoption of New Service Development Tools in the Financial Service IndustryDayu Tony Jin
 
The Staffing Organizations ModelThe Staffing Organizations Mod.docx
The Staffing Organizations ModelThe Staffing Organizations Mod.docxThe Staffing Organizations ModelThe Staffing Organizations Mod.docx
The Staffing Organizations ModelThe Staffing Organizations Mod.docxssusera34210
 
The use of accounting information systems in analytical procedures for the au...
The use of accounting information systems in analytical procedures for the au...The use of accounting information systems in analytical procedures for the au...
The use of accounting information systems in analytical procedures for the au...Alexander Decker
 
Accounting information system
Accounting information systemAccounting information system
Accounting information systemAlexander Decker
 
Lean Six Sigma Overview (print version)
Lean Six Sigma Overview (print version)Lean Six Sigma Overview (print version)
Lean Six Sigma Overview (print version)Corey Campbell
 
wepik-exploring-effective-strategies-for-data-collection-navigating-the-chall...
wepik-exploring-effective-strategies-for-data-collection-navigating-the-chall...wepik-exploring-effective-strategies-for-data-collection-navigating-the-chall...
wepik-exploring-effective-strategies-for-data-collection-navigating-the-chall...Adikesavaperumal
 
Key considerations for your internal audit plan
Key considerations for your internal audit planKey considerations for your internal audit plan
Key considerations for your internal audit planessbaih
 
Lean Six Sigma Overview (presentation version)
Lean Six Sigma Overview (presentation version)Lean Six Sigma Overview (presentation version)
Lean Six Sigma Overview (presentation version)Corey Campbell
 
Alfred & Advaith Big Data Analytics in Accounting.pptx
Alfred & Advaith Big Data Analytics in Accounting.pptxAlfred & Advaith Big Data Analytics in Accounting.pptx
Alfred & Advaith Big Data Analytics in Accounting.pptxKumarasamy Dr.PK
 

Similar to [DSC Adria 23] Mirjana Pejic Bach Data mining approach to internal fraud in a project-based organization.pptx (20)

Credit risk scoring model final
Credit risk scoring model finalCredit risk scoring model final
Credit risk scoring model final
 
Research Methods in Marketing
Research Methods in MarketingResearch Methods in Marketing
Research Methods in Marketing
 
Partnerships in Clinical Trials 2014
Partnerships in Clinical Trials 2014Partnerships in Clinical Trials 2014
Partnerships in Clinical Trials 2014
 
A literary study on the bonding of the Six Sigma with the Service Quality for...
A literary study on the bonding of the Six Sigma with the Service Quality for...A literary study on the bonding of the Six Sigma with the Service Quality for...
A literary study on the bonding of the Six Sigma with the Service Quality for...
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Ijetr021221
Ijetr021221Ijetr021221
Ijetr021221
 
Ijetr021221
Ijetr021221Ijetr021221
Ijetr021221
 
Software metrics by Dr. B. J. Mohite
Software metrics by Dr. B. J. MohiteSoftware metrics by Dr. B. J. Mohite
Software metrics by Dr. B. J. Mohite
 
Unit 1.pptx
Unit 1.pptxUnit 1.pptx
Unit 1.pptx
 
Quantitative Techniques: Introduction
Quantitative Techniques: IntroductionQuantitative Techniques: Introduction
Quantitative Techniques: Introduction
 
Adoption of New Service Development Tools in the Financial Service Industry
Adoption of New Service Development Tools in the Financial Service IndustryAdoption of New Service Development Tools in the Financial Service Industry
Adoption of New Service Development Tools in the Financial Service Industry
 
The Staffing Organizations ModelThe Staffing Organizations Mod.docx
The Staffing Organizations ModelThe Staffing Organizations Mod.docxThe Staffing Organizations ModelThe Staffing Organizations Mod.docx
The Staffing Organizations ModelThe Staffing Organizations Mod.docx
 
The use of accounting information systems in analytical procedures for the au...
The use of accounting information systems in analytical procedures for the au...The use of accounting information systems in analytical procedures for the au...
The use of accounting information systems in analytical procedures for the au...
 
Accounting information system
Accounting information systemAccounting information system
Accounting information system
 
Lean Six Sigma Overview (print version)
Lean Six Sigma Overview (print version)Lean Six Sigma Overview (print version)
Lean Six Sigma Overview (print version)
 
wepik-exploring-effective-strategies-for-data-collection-navigating-the-chall...
wepik-exploring-effective-strategies-for-data-collection-navigating-the-chall...wepik-exploring-effective-strategies-for-data-collection-navigating-the-chall...
wepik-exploring-effective-strategies-for-data-collection-navigating-the-chall...
 
Key considerations for your internal audit plan
Key considerations for your internal audit planKey considerations for your internal audit plan
Key considerations for your internal audit plan
 
Lean Six Sigma Overview (presentation version)
Lean Six Sigma Overview (presentation version)Lean Six Sigma Overview (presentation version)
Lean Six Sigma Overview (presentation version)
 
Alfred & Advaith Big Data Analytics in Accounting.pptx
Alfred & Advaith Big Data Analytics in Accounting.pptxAlfred & Advaith Big Data Analytics in Accounting.pptx
Alfred & Advaith Big Data Analytics in Accounting.pptx
 
evaluation
evaluationevaluation
evaluation
 

More from DataScienceConferenc1

[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdfDataScienceConferenc1
 
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...DataScienceConferenc1
 
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdfDataScienceConferenc1
 
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdfDataScienceConferenc1
 
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdfDataScienceConferenc1
 
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptxDataScienceConferenc1
 
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdfDataScienceConferenc1
 
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...DataScienceConferenc1
 
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdfDataScienceConferenc1
 
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...DataScienceConferenc1
 
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...DataScienceConferenc1
 
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdfDataScienceConferenc1
 
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptxDataScienceConferenc1
 
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...DataScienceConferenc1
 
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptxDataScienceConferenc1
 
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...DataScienceConferenc1
 
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...DataScienceConferenc1
 
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptxDataScienceConferenc1
 
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptxDataScienceConferenc1
 
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdfDataScienceConferenc1
 

More from DataScienceConferenc1 (20)

[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
 
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
 
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
 
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
 
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
 
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
 
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
 
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
 
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
 
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
 
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
 
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
 
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
 
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
 
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
 
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
 
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
 
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
 
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
 
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
 

Recently uploaded

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 

Recently uploaded (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 

[DSC Adria 23] Mirjana Pejic Bach Data mining approach to internal fraud in a project-based organization.pptx

  • 1. Data mining approach to internal fraud in a project- based organization Mirjana Pejić Bach, Ksenija Dumičić, Berislav Žmuk, Tamara Ćurlin, Jovana Zoraja
  • 2. Introduction 01 02 03 04 05 06 Table of contents Literature Review Objectives Methodology Results Conclusions 2
  • 4. Internal fraud in data mining ● Internal fraud  one of the crucial and increasingly serious problems in numerous organizations ● Internal fraud control  widely used for forecasting, detecting and preventing possible fraudulent behaviours conducted by organizations’ employees1 ● Data mining techniques  widely used for external fraud detection and prevention ● Project-based organizations  prone to internal fraud because of lower level of control (result of flatter organizational structure2), poor management practices3, complex governance procedures4 4 Introduction
  • 5. Developed case study Dataset Project- based organization Working-hour claims: • Client • Expert • Job characteristics chi-square automatic interaction detection (CHAID) Analysed by link analysis and Data mining models Aim of the study: 5 Introduction
  • 6. Contributions od the study: Contributing to the area of internal fraud detection and prevention in project-based organizations, while most of the previous research has been oriented towards external fraud prevention. Providing practical contributions, since our research results in the form of decision tree and association rules could enable organizations for developing their own solutions for automatic internal fraud-detection (e.g. using SQL code). 1 2 6 Introduction
  • 8. Fraud is a severe problem for companies, both inside and outside the organization, with potentially significant financial losses and damage to the company's reputation5 Internal controls are critical for detecting and preventing fraudulent behaviour, and weak compliance with internal controls increases the risk of fraud6 Many organizations lack an efficient internal control system, and there are recommendations for increasing their effectiveness, but progress can be slow due to difficulties accessing data from previous cases1 Internal controls and fraud 8 Literature Review
  • 9. Data mining is a tool used to extract significant patterns from databases for further investigation, interpretation, and understanding7. It can achieve three fundamental goals: description, prediction, and prescription Challenges in data mining include performance time, management support, and selection and execution of algorithms. The choice of data mining technique depends on the structure of the data and the desired results8 The main concern is the actual usage of data mining results in decision- making, which often depends on management willingness to support its application Data mining 9 Literature Review
  • 10. Automated fraud detection systems based on data mining models have gained popularity, particularly in financial institutions9 Data mining has been successful in detecting and preventing fraud in various fields, including financial crime detection, intrusion and spam detection, and fraud analysis10 Data mining techniques can be used to decrease the probability of internal fraud and have become an important tool for intelligent business analytics and decision support for internal fraud prevention. Many organizations acknowledge data mining as one of the main technologies relevant to internal fraud prevention10 Data mining for internal fraud detection 10 Literature Review
  • 12. Objectives ● Case study analysis was conducted on a large project- based company with over 300 employees who provide monthly work reports, including hours worked, client characteristics, work complexity, and claimed amounts ● This information is used to fill the working-hours claim 12 Research goal: Develop a model to prevent potential fraudulent behaviour by identifying characteristics of suspected working-hour claims for in-depth fraud analysis. Objectives
  • 14. Suspect working-hours claims are defined by the company as those that meet at least one of two criteria: late submission or cancellation of already claimed hours. The company wants to identify characteristics of potential fraud claims before they are considered suspect. The dataset in Table 1 includes 1194 working- hours claims, with 24.62% being suspect for fraud and 75.38% being non-suspect. The variable "Suspect" is used to categorize the claims as suspect (with a value of 1) or non-suspect (with a value of 0). 14 Data Variable suspect Count Percent (%) Suspect (value 1) 294 24.62 Non-suspect (value 0) 900 75.38 Total 1194 100.00 Table 1. Suspect and non-suspect working-hour claims in the sample Methodology
  • 15. ● Type of customer – variable Customer ● Type of consultant – variable Consultant ● The month when the working-hours were claimed – variable Month ● The hourly-rate – variable UnitPriceCoded ● The consultant’s level of expertise – variable ExpertLevel ● The number of hours claimed – variable NoHoursCoded ● The total amount claimed – variable TotalAmountCoded The independent variables in the working-hour claims are used for developing data mining models: Data 15 Methodology
  • 16. • Variable "Customer" in Table 2. is divided into three categories: governmental institutions, internal projects, and private enterprises • Internal projects suspected in 50.68% of cases • Chi-square test confirms a difference in the structure of suspect claims across customer categories (p-value < 0.001) 16 Data Customer origin Suspect (%) Not suspect (%) Chi-square P-value Govern 4.76 95.24 77.435 <0.001 Internal 50.68 49.32 Private 22.80 77.20 Totals 24.62 75.38 Table 2. Types of the customer – variable Customer Methodology
  • 17. • Consultant variable in Table 3 indicates the country of origin of experts claiming working hours • Domestic consultants had 23.46% of their claims suspected, while foreign consultants had 41.56% suspected • Chi-square test shows that domestic and foreign consultants have significantly different structures regarding suspected and non-suspected claims 17 Data Consultant origin Suspect (%) Not suspect (%) Chi-square P-value Domestic 23.46 76.54 12.719 <0.001 Foreign 41.56 58.44 Totals 24.62 75.38 Table 3. Types of the consultant – variable Consultant Methodology
  • 18. • Months M1 and M12 in Table 4 have the highest percentage of suspected working-hours claims (65.77% and 30.59% respectively), likely related to the beginning and end of the fiscal year • Months M10 and M4 have the highest percentage of non- suspected working-hours claims (88.42% and 86.40% respectively) • The chi-square test confirms a statistically significant difference in suspected working-hours claims across different months (chi- square=134.670, df=11, p- value<0.001) 18 Data The month of the claim Suspect (%) Not suspect (%) Chi- square P-value M1 65.77 34.23 134.670 <0.001 M2 28.13 71.88 M3 17.31 82.69 M4 13.60 86.40 M5 16.81 83.19 M6 20.39 79.61 M7 14.29 85.71 M8 28.09 71.91 M9 25.93 74.07 M10 11.58 88.42 M11 19.28 80.72 M12 30.59 69.41 Table 4. The month when the working-hours were claimed – variable Month Methodology
  • 19. • UnitPriceCoded categorizes consultants' cost per hour into four groups in Table 5 • Highest share of suspected claims found in 151-200 EUR cost category (30.00%) • Chi-square test shows no significant difference between suspected and non-suspected claims across the four cost categories at a 5% significance level 19 Data The hourly rate Suspect (%) Not suspect (%) Chi-square P-value 1-50 EUR/h 10.81 89.19 6.278 0.099 51-100 EUR/h 25.59 74.41 101-150 EUR/h 18.75 81.25 151-200 EUR/h 30.00 70.00 Totals 24.62 75.38 Table 5. The hourly-rate – variable UnitPriceCoded Methodology
  • 20. • Expert Level variable has five levels coded from L4 to L8, as shown in Table 6 • The highest share of suspected claims is present at the expert level L5 • The chi-square test confirms that there is a statistically significant difference between expert levels regarding the shares of suspected and non-suspected working-hours claims 20 Data Consultant Suspect (%) Not suspect (%) Chi-square P-value L6 24.69 75.31 33.147 <0.001 L5 77.78 22.22 L8 30.00 70.00 L4 10.81 89.19 L7 18.75 81.25 Totals 24.62 75.38 Table 6. The consultant’s level of expertise – variable ExpertLevel
  • 21. • Table 7 shows the number of weekly working hours classified into eight groups ranging from 1-5 to 36-55 • An additional category was added to account for negative weekly working hours, which are always treated as suspected • The Chi-square test showed that at least one weekly working-hours category has a statistically significant difference in the share of suspected working-hours claims compared to other categories at the significance level of 1% 21 Data The number of hours Suspect (%) Not suspect (%) Chi- square P-value Negative hours 100.00% 0.00% 53.859 <0.001 1-5 hours 22.63% 77.37% 6-10 hours 26.06% 73.94% 11-15 hours 26.90% 73.10% 16-20 hours 23.39% 76.61% 21-25 hours 17.65% 82.35% 25-30 hours 9.68% 90.32% 31-35 hours 23.08% 76.92% 36-55 hours 21.05% 78.95% Totals 24.62% 75.38% Table 7. The number of hours claimed – variable NoHoursCoded Methodology
  • 22. • Variable TotalAmountCoded categorizes the costs of consultants’ working-hours into 19 cost categories. • Negative amount claimed for working- hour claims with negative hours • Chi-square test showed at least one total cost per consultant category with statistically significantly different shares of suspected working-hours claims 22 Data The number amount claimed Suspect (%) Not suspect (%) Chi-square P-value 1-100 EUR 25.30% 74.70% 80.068 <0.001 101-200 EUR 16.94% 83.06% 201-300 EUR 24.56% 75.44% 301-400 EUR 24.17% 75.83% 401-500 EUR 24.21% 75.79% 501-600 EUR 24.51% 75.49% 601-700 EUR 31.52% 68.48% 701-800 EUR 16.67% 83.33% 801-900 EUR 13.33% 86.67% 901-1000 EUR 42.42% 57.58% 1001-1100 EUR 35.19% 64.81% 1101-1300 EUR 27.87% 72.13% 1301-1500 EUR 29.33% 70.67% 1501-1600 EUR 5.56% 94.44% 1601-1700 EUR 15.38% 84.62% 1701-2000 EUR 22.22% 77.78% 2001-3000 EUR 18.97% 81.03% 3001-4000 EUR 13.79% 86.21% Negative 100.00% 0.00% Totals 24.62% 75.38% Table 8. The total amount claimed – variable TotalAmountCoded Methodology
  • 23. CHAID decision tree ● Decision tree created with CHAID algorithm was developed to understand the relationship between working hour claim fraud and various characteristics ● CHAID algorithm uses the chi-square test to select the best split at each step ● Dependent variable is Suspect and all other observed variables are independent variables ● Decision tree depth goes up to the third level11, with main node having at least 100 cases and child nodes having at least 50 cases 23 Methodology
  • 24. Link analysis Description Formulation 24 By using link analysis, the association rules are extracted in order to detect significant relationships between suspect working-hours claims and various characteristics of customers, consultants, and projects. Association rules can be described as: If A=1 and B=1 then C=1 with probability p Link analysis is a data analysis technique used to identify and evaluate relationships between items represented as "nodes." It can be used for the detection of potentially suspect working- hours claims based on characteristics of clients, consultants, and projects. where A, B, and C are binary variables, p is a conditional probability defined as p = p(C = 1|A = 1, B = 1) Methodology
  • 25. Link analysis ● All eight variables are analysed, including Suspect, Customer, Consultant, Month, UnitPriceCoded, ExpertLevel, NoHoursCoded, and TotalAmountCoded ● The non-sequential association analysis approach is used, and Statistica Data Miner software ver. 13.5 is employed for link analysis ● The minimum support value, which shows how frequently an itemset appears in the dataset, has been set to value 0.2 whereas the maximum value was set to 1.0. Support is calculated as: "Support" (A ⇒ B) = p(A ∪ B) 25 Methodology
  • 26. Link analysis ● Minimum support and confidence values were set to exclude items from the analysis ● Confidence measures how often a rule is true and is calculated using an equation ● Additional, it has been defined that the maximum number of items in an item set is 10 ● No strict rules exist for selecting minimum support value, minimum confidence value, or maximum number of items in an item set12 ● The limits used in this study were chosen because experiments showed they resulted in interesting rules ● Other authors may use subjective criteria13,14 Confidence (A ⇒ B) = p(B│A) = Support(A, B) / Support (A) 26 Methodology
  • 28. Decision tree • According to defined settings, the CHAID decision tree is developed, displayed on Figure 1 • The resulting CHAID decision tree has 3 levels and overall 11 nodes out of which seven are considered as a terminal • Figure 1 also reveals that variables Month, Customer and ExpertLevel had the highest level of statistical significance and therefore they are used in building the classification tree 28 Results Figure 1. CHAID decision tree
  • 29. 01 02 03 Decision tree The variable used for branching on the first level is the variable Month, which turned out to be statistically significant at the level of 1%, resulting in three new nodes with varying percentages of suspected and non- suspected working-hour claims. Branching on the second level using the variable Customer resulted in five new nodes, three of which came out from Node 1 and two from Node 2, with highly statistically significant chi-square values at 1%, and the largest node, Node 6, consisted of customers of private enterprises with 83.9% connected to non- suspected working-hours claims and 16.1% connected to suspected working-hours claims. 29 Results The third level branching variable is the variable ExpertLevel, and this branching process is statistically significant at the 5% level. Node 9 is considerably larger than Node 10 and includes 431 or 85.5% non-suspected working-hours claims and 73 or 14.5% suspected working-hours claims. Most important notes
  • 30. • Table 9 shows the classification matrix comparing observed and predicted status of working-hours claims • The algorithm was correct in 93.3% of cases for non-suspect working-hour claims • For suspected working-hour claims, the algorithm correctly classified only 36.1% of them out of 294 Decision tree Observed classification Non suspect Predicted classification Suspect Percent correct Non-suspect 840 60 93.3% Suspect 188 106 36.1% Overall percentage 86.1% 13.9% 79.2% Table 9. The number of hours claimed – variable NoHoursCoded 30 Results
  • 31. • Association rules were developed using selected metrics, as shown in Table 10 • The item Suspect alone appeared in 27.71% of item sets with a frequency of 235 • Suspected working-hour claims are closely related to private enterprises, domestic consultants, cost per hour between 51 and 100 EUR, and expert level L6 Link analysis Table 10. Frequent item sets that contain Suspect item 31 Results Frequent items sets Number of items Frequency Support (%) Suspect 1 235 27.712 51-100, Suspect 2 225 26.533 51-100, L6, Suspect 3 221 26.061 L6, Suspect 2 221 26.061 Domestic, Suspect 2 220 25.943 51-100, Domestic, Suspect 3 210 24.764 51-100, Domestic, L6, Suspect 4 206 24.292 Domestic, L6, Suspect 3 206 24.292 Private, Suspect 2 193 22.759 Domestic, Private, Suspect 3 185 21.816 51-100, Private, Suspect 3 183 21.580 L6, Private, Suspect 3 180 21.226 51-100, L6, Private, Suspect 4 180 21.226 51-100, Domestic, Private, Suspect 4 175 20.636 51-100, Domestic, L6, Private, Suspect 5 172 20.283 Domestic, L6, Private, Suspect 4 171 20.283
  • 32. Link analysis 32 Results • Figure 2 is a Web graph showing important nodes related to domestic experts, non-suspected claims, L6 expertise level, private customers, and low hourly paid rates (51-100 EUR) • The strongest joint support is for non-suspected claims and the above-mentioned nodes • The darkest line represents the strong relationship between L6 expertise level and low hourly paid rates (51-100 EUR) Figure 2. Web graph of items generated by link analysis
  • 33. Link analysis 33 Results • Figure 3 shows a rule graph of items generated by link analysis • Node size represents the relative support of each item, and colour darkness represents relative confidence • The rule with the highest confidence and support is the relationship between the lowest level of expertise (L6) and one of the low levels of hourly paid rate (51-100 EUR), and rules containing the item Suspect are presented with small node sizes Figure 3. Rule graph of items generated by link analysis
  • 34. Link analysis 34 Results • Table 11 shows association rules with the item Suspect in the body • The first rule indicates that 26.53% of working-hours claims with cost per hour between 51 and 100 EUR are suspected • The second and third rules have the same support and confidence levels • Note: Body for each row equales to Suspect Table 11. Frequent association rules with the item Suspect in the body Head Support (%) Confidence (%) Lift (%) 51-100 26.533 95.745 1.052 51-100, L6 26.061 94.043 1.042 L6 26.061 94.043 1.042 Domestic 25.943 93.617 0.985 51-100, Domestic 24.764 89.362 1.031 51-100, Domestic, L6 24.292 87.660 1.021 Domestic, L6 24.292 87.660 1.021 Private 22.759 82.128 0.992 Domestic, Private 21.816 78.723 0.995 51-100, Private 21.580 77.872 1.025 51-100, L6, Private 21.226 76.596 1.016 L6, Private 21.226 76.596 1.016 51-100, Domestic, Private 20.637 74.468 1.027 51-100, Domestic, L6, Private 20.283 73.191 1.017 Domestic, L6, Private 20.283 73.191 1.017
  • 35. Link analysis 35 Results • Table 12 shows association rules with the item Suspect and one more item in the Body • The strongest association is achieved when Suspect and Private are in the Body with the item Domestic • The strongest association is achieved when Suspect and L6 are in the Body with the item 51-100 Table 12. Association rules with the item Suspect and one more item in the Body Body Head Support (%) Confidence (%) Lift (%) Private, Suspect Domestic 21.816 95.855 1.008 Private, Suspect 51-100 21.580 94.819 1.042 Private, Suspect 51-100, L6 21.226 93.264 1.034 Private, Suspect L6 21.226 93.264 1.034 Private, Suspect 51-100, Domestic 20.637 90.674 1.046 Private, Suspect 51-100, Domestic, L6 20.283 89.119 1.038 Private, Suspect Domestic, L6 20.283 89.119 1.038 L6, Suspect 51-100 26.061 100.000 1.098 L6, Suspect 51-100, Domestic 24.292 93.213 1.075 L6, Suspect Domestic 24.292 93.213 0.981 L6, Suspect 51-100, Private 21.226 81.448 1.072 L6, Suspect Private 21.226 81.448 0.984 L6, Suspect 51-100, Domestic, Private 20.283 77.828 1.073 L6, Suspect Domestic, Private 21.816 95.855 0.984
  • 36. Link analysis 36 Results • The rest of Table 12: Table 12. Association rules with the item Suspect and one more item in the Body Body Head Support (%) Confidence (%) Lift (%) Domestic, Suspect 51-100 24.764 95.455 1.049 Domestic, Suspect 51-100, L6 24.292 93.636 1.038 Domestic, Suspect L6 24.292 93.636 1.038 Domestic, Suspect Private 21.816 84.091 1.016 Domestic, Suspect 51-100, Private 20.637 79.545 1.047 Domestic, Suspect 51-100, L6, Private 20.283 78.182 1.038 Domestic, Suspect L6, Private 20.283 78.182 1.038 51-100, Suspect L6 26.061 98.222 1.089 51-100, Suspect Domestic 24.764 93.333 0.982 51-100, Suspect Domestic, L6 24.292 91.556 1.066 51-100, Suspect Private 21.580 81.333 0.982 51-100, Suspect L6, Private 21.226 80.000 1.062 51-100, Suspect Domestic, Private 20.637 77.778 0.983 51-100, Suspect Domestic, L6, Private 20.283 76.444 1.063
  • 37. Link analysis 37 Results • Table 13 presents association rules with item Suspect and two or more items in the Body • Suspected working-hours claims with customers from private enterprises and expert level L6 have cost per hour between 51 and 100 EUR • Similar conclusion reached when items Domestic, L6, Private, and Suspect are associated with item 51-100 Table 13. Association rules with the item Suspect and two or more items in the Body Body Head Support (%) Confidence (%) Lift (%) L6, Private, Suspect 51-100 21.226 100.000 1.098 L6, Private, Suspect 51-100, Domestic 20.283 95.556 1.102 L6, Private, Suspect Domestic 20.283 95.556 1.005 Domestic, Private, Suspect 51-100 20.637 94.595 1.039 Domestic, Private, Suspect 51-100, L6 20.283 92.973 1.031 Domestic, Private, Suspect L6 20.283 92.973 1.031 Domestic, L6, Suspect 51-100 24.292 100.000 1.098 Domestic, L6, Suspect 51-100, Private 20.283 83.495 1.099 Domestic, L6, Suspect Private 20.283 83.495 1.009 Domestic, L6, Private, Suspect 51-100 20.283 100.000 1.098 51-100, Private, Suspect L6 21.226 98.361 1.090
  • 38. Link analysis 38 Results • The rest of Table 13: Table 13. Association rules with the item Suspect and two or more items in the Body Body Head Support (%) Confidence (%) Lift (%) 51-100, Private, Suspect Domestic 20.637 95.628 1.006 51-100, Private, Suspect Domestic, L6 20.283 93.989 1.095 51-100, L6, Suspect Domestic 24.292 93.213 0.981 51-100, L6, Suspect Private 21.226 81.448 0.984 51-100, L6, Suspect Domestic, Private 20.283 77.828 0.984 51-100, L6, Private, Suspect Domestic 20.283 95.556 1.005 51-100, Domestic, Suspect L6 24.292 98.095 1.087 51-100, Domestic, Suspect Private 20.637 83.333 1.007 51-100, Domestic, Suspect L6, Private 20.283 81.905 1.087 51-100, Domestic, Private, Suspect L6 20.283 98.286 1.089 51-100, Domestic, L6, Suspect Private 20.283 83.495 1.009
  • 40. 01 02 03 Conclusions A case study analysis was conducted on suspected working- hour claims in a project-based company Two data mining models were developed to identify characteristics of fraudulent working-hour claims Results show that suspected claims are related to private enterprise customers, domestic consultants, low hourly rates, and lowest expertise level. 04 05 06 40 Conclusions The paper demonstrates the use of data mining methodology to detect internal fraud in project- based organizations, focusing on working-hour claims. The research has significant practical implications for organizations to expand usage and potentials of different data mining techniques, which could help them to be more effective and efficient in investigating and preventing internal fraud. However, future research is needed to test the general applicability of the results. The decision tree and link analysis are recommended for use as a supportive instrument for the detection of suspect working-hour claims, in combination with other human-based and machine-based methods.
  • 41. Acknowledgements This paper extends the research on the internal fraud using the CHAID decision tree that was presented on CENTERIS - Conference on ENTERprise Information Systems (Pejić Bach et al., 2018) [11]. This research has been fully supported by the Croatian Science Foundation under the PROSPER (Process and Business Intelligence for Business Performance) project (IP-2014-09-3729).
  • 42. References 1 ACFE (2018). Report to the Nations: 2018 Global Study on Occupational Fraud and Abuse [Online]. Available: https://www.acfe.com/uploadedFiles/ACFE_Website/Content/rttn/2018/RTTN-Government-Edition.pdf 2 C. Smith, Making Sense of Project Realities: Theory, Practice and the Pursuit of Performance. Abingdon, UK: Routledge, 2017. 3 S. Bathallath, Å. Smedberg and H. Kjellin, “Managing project interdependencies in IT/IS project portfolios: A review of managerial issues,” International Journal of Information Systems and Project Management, vol. 4, no. 1, pp. 67-82, 2016. 4 D. R. Chandra and J. van Hillegersberg, “Governance of inter-organizational systems: A longitudinal case study of Rotterdam’s Port Community System,” International Journal of Information Systems and Project Management, vol. 6, no. 2, pp. 47-68, 2018. 5 P. Gottschalk, “Categories of financial crime,” Journal of Financial Crime, vol. 17, no. 4, pp. 441-458, 2010. 6 M. DeLiema, G. Mottola and M. Deevy. (2017, February). Findings from a pilot study to measure financial fraud in the United States [Online]. Available: https://ssrn.com/abstract=2914560 7 J. Leskovec, A. Rajaraman and J. D. Ullman, Mining of Massive Data Sets. Cambridge, UK: Cambridge University Press, 2019. 8 C. K. S. Leung, “Big data analysis and mining,” in Advanced Methodologies and Technologies in Network Architecture, Mobile Computing, and Data Analytics, D.B.A. Mehdi Khosrow-Pour, Ed. Pennsylvania, USA: IGI Global, 2019, pp. 15-27. 9 E. W. T. Ngai, Y. Hu, Y. H. Wong, Y. Chen and X. Sun, “The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature,“ Decision Support Systems, vol. 50, no. 3, pp. 559-569, 2011. 10 M. J. Kranacher and R. Riley, Forensic Accounting and Fraud Examination. Hoboken, NJ: John Wiley & Sons, 2019. 11 D. Bertsimas and J. Dunn, “Optimal classification trees,” Machine Learning, vol. 106, no. 7, pp. 1039-1082, 2017. 12 P. Lenca, P. Meyer, B. Vaillant and S. Lallich, “On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid,” European Journal of Operational Research, vol. 184, no. 2, pp. 610-626, 2008. 13 T. Brijs, G. Swinnen, K. Vanhoof and G. Wets, “Building an association rules framework to improve product assortment decisions,” Data Mining and Knowledge Discovery, vol. 8, no. 1, pp. 7-23, 2004. 14 Y. Boztuğ and T. Reutterer, “A combined approach for segment-specific market basket analysis,” European Journal of Operational Research, vol. 187, no. 1, pp. 294-312, 2008.
  • 43. CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon and infographics & images by Freepik Thank you for your attention! Do you have any questions? Mirjana Pejić Bach University of Zagreb, Faculty of Economics and Business Trg J. F. Kennedyja 6, 10000 Zagreb Croatia http://www.shortbio.net/mpejic@efzg.hr