SlideShare ist ein Scribd-Unternehmen logo
1 von 60
Downloaden Sie, um offline zu lesen
DM	
  Intro	
  
Integrated	
  Knowledge	
  Solutions	
  
iksinc@yahoo.com	
  
iksinc.wordpress.com	
  
What	
  is	
  Data?	
  
©IKSINC
Data is a set of facts/observations/
measurements about objects/
events/processes of interest
What	
  is	
  Information?	
  
©IKSINC
Information is processed data that
is useful in one way or the other, for
example for decision making,
communication etc.
While the data
is fixed,
information
from it can
differ based
on needs
What	
  is	
  Knowledge?	
  
©IKSINC
Patterns of relationships in data and
information that exhibit a high
degree of certainty
What	
  Is	
  Data	
  Mining?	
  
	
  Data	
  mining	
  is	
  essen+ally	
  a	
  process	
  of	
  data-­‐driven	
  extrac+on	
  
of	
  not	
  so	
  obvious	
  but	
  useful	
  informa+on	
  from	
  large	
  databases.	
  
The	
  en+re	
  process	
  is	
  interac+ve	
  and	
  itera+ve.	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  Data	
  mining	
  also	
  goes	
  under	
  various	
  other	
  names	
  such	
  as:	
  
	
  Knowledge	
  discovery	
  in	
  databases	
  (KDD),	
  knowledge	
  
extrac+on,	
  data/paDern	
  analysis,	
  data	
  analy+cs,	
  business	
  
intelligence,	
  etc.	
  
©IKSINC
© IKSINC
Latest
buzz
word
Why	
  Data	
  Mining?	
  
•  The	
  Data	
  Glut	
  
•  Data	
  rich	
  but	
  informa+on	
  
poor	
  businesses	
  
•  Es+mates	
  of	
  data	
  doubling	
  
every	
  20	
  months	
  
•  The	
  average	
  Fortune	
  500	
  
company	
  manages	
  over	
  a	
  
terabyte	
  of	
  data	
  everyday	
  
•  Convergence	
  of	
  
Technologies	
  
•  Compe++ve	
  Edge	
  
•  Mass	
  marke+ng	
  versus	
  
targeted	
  marke+ng	
  
©IKSINC
©IKSINC
Changing	
  Business	
  Direction	
  
©IKSINC
Typical	
  Business	
  Applications	
  of	
  Data	
  
Mining	
  
•  Market	
  
Segmenta+on	
  
•  Customer	
  Targe+ng	
  
and	
  Reten+on	
  
•  Product	
  Design	
  and	
  
Placement	
  
•  Credit	
  Card	
  Fraud	
  
Detec+on	
  
•  Web	
  Adver+sing	
  
•  Recommenda+on	
  
Systems	
  
©IKSINC
Other	
  Applications	
  of	
  Data	
  Mining	
  
•  Stock	
  Market	
  Trends	
  
•  Text	
  and	
  Mul+media	
  Data	
  
Mining	
  
•  Sports	
  Scou+ng	
  
•  Medical	
  Outcomes	
  Analysis	
  
•  Scien+fic	
  Data	
  Mining	
  
	
  
	
  
©IKSINC
Data	
  ClassiDication	
  
•  Structured	
  Data	
  
•  Data	
  consis+ng	
  of	
  well-­‐defined	
  fields	
  
of	
  numeric	
  or	
  alphanumeric	
  values	
  
©IKSINC
Data	
  ClassiDication	
  
•  Unstructured	
  Data	
  
•  No	
  well	
  defined	
  fields	
  of	
  
informa+on	
  
•  Requires	
  extensive	
  
processing	
  to	
  extract	
  
content	
  informa+on	
  
•  Examples	
  include	
  blogs,	
  
news	
  reports,	
  images,	
  
videos,	
  tweets	
  etc.	
  
•  Fastest	
  growing	
  data	
  
segment	
  
©IKSINC
Data	
  ClassiDication	
  
•  Semi-­‐Structured	
  Data	
  
•  Data	
  with	
  par+al	
  structure	
  (medical	
  reports,	
  
execu+ve	
  summaries,	
  interview	
  scripts,	
  web	
  
documents	
  etc.)	
  
©IKSINC
Core	
  Technologies	
  for	
  Data	
  Mining	
  
Data Mining
Machine
Learning
Database
Technology
Statistics
Visualization
Information
Retrieval
©IKSINC
Data Mining and Business Intelligence
Increasing potential
to support
business decisions End User
Business
Analyst
Data
Analyst
DBA
Making
Decisions
Data Presentation
Visualization Techniques
Data Mining
Information Discovery
Data Exploration
OLAP, MDA
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
Data Sources
Paper, Files, Information Providers, Database Systems, OLTP
©IKSINC
Architecture of a Typical
Data Mining System
Data
Warehouse
Data cleaning & data integration Filtering
Databases
Database or data
warehouse server
Data mining engine
Pattern evaluation
Graphical user interface
Knowledge-base
Data	
  Mining	
  and	
  Data	
  Warehousing	
  
©IKSINC
Customers
Etc…
Vendors Etc…
Orders
Data
Warehouse
Enterprise
“Database”
Transactions
Copied,
organized
summarized
Data Mining
Data Miners
A data warehouse is a
data repository set up
to support strategic
decision making.
Data	
  Mart	
  
•  A	
  Data	
  Mart	
  is	
  a	
  smaller,	
  more	
  focused	
  Data	
  Warehouse	
  –	
  a	
  
mini-­‐warehouse.	
  
•  A	
  Data	
  Mart	
  typically	
  reflects	
  the	
  business	
  rules	
  of	
  a	
  specific	
  
business	
  unit	
  within	
  an	
  enterprise.	
  
Accessing	
  Information	
  in	
  a	
  Data	
  
Warehouse	
  
•  Structured	
  Query	
  Language	
  (SQL)-­‐based	
  
Repor+ng	
  and	
  Query	
  Tools	
  
•  Good	
  for	
  extrac+ng	
  shallow,	
  non-­‐dimensional	
  
informa+on.	
  For	
  example,	
  “Find	
  all	
  credit	
  card	
  
customers	
  holding	
  a	
  balance	
  payment	
  of	
  $500	
  
or	
  more.”	
  
©IKSINC
Accessing	
  Information	
  in	
  a	
  Data	
  
Warehouse	
  
•  On-­‐Line	
  Analy+cal	
  Processing	
  (OLAP)-­‐based	
  
Repor+ng	
  and	
  Query	
  Tools	
  
•  Good	
  for	
  extrac+ng	
  shallow,	
  dimensional	
  
informa+on.	
  For	
  example,	
  “Find	
  all	
  credit	
  card	
  
customers	
  who	
  live	
  in	
  Midwest,	
  drive	
  a	
  luxury	
  car,	
  
and	
  hold	
  a	
  balance	
  payment	
  of	
  $500	
  or	
  more.”	
  
©IKSINC
On-­‐Line	
  Analytical	
  Processing	
  (OLAP)	
  
•  OLAP	
  tools	
  organize	
  data	
  in	
  a	
  dimensional	
  
representa+on,	
  called	
  	
  a	
  cube.	
  This	
  permits	
  data	
  
to	
  be	
  viewed	
  from	
  any	
  user	
  specified	
  angle	
  to	
  
allow	
  slicing-­‐and-­‐dicing	
  of	
  the	
  data.	
  	
  
•  SQL	
  can	
  easily	
  answer	
  ‘who?’	
  and	
  ‘what?’	
  
ques+ons,	
  however,	
  ability	
  to	
  answer	
  ‘what	
  if?’	
  
and	
  ‘why?’	
  type	
  ques+ons	
  dis+nguishes	
  OLAP	
  
from	
  general-­‐purpose	
  query	
  tools.	
  	
  
	
  
©IKSINC
©IKSINC
Time
Regions
Northeast
Midwest
South
West
08 09 10 11
Products
A
B
C
D
25
OLAP	
  Operations	
  
Single Cell Multiple Cells Slice Dice
Roll Up
Drill Down
Different	
  OLAP	
  Tools	
  
•  Desktop	
  OLAP	
  (DOLAP)	
  
•  Limited	
  capability	
  PC-­‐based	
  tools	
  
•  Rela+onal	
  OLAP	
  (ROLAP)	
  
•  Server-­‐based	
  tools	
  that	
  let	
  a	
  user	
  assemble	
  into	
  a	
  cube	
  
a	
  subset	
  of	
  data	
  from	
  a	
  rela+onal	
  database	
  
•  Mul+dimensional	
  OLAP	
  (MOLAP)	
  
•  Server-­‐based	
  tools	
  that	
  use	
  pre-­‐computed	
  cubes	
  of	
  
data	
  for	
  faster	
  response	
  
©IKSINC
PowerPivot	
  
©IKSINC
Accessing	
  Information	
  in	
  a	
  Data	
  Warehouse	
  
•  Data	
  Mining	
  Tools	
  
•  Good	
  for	
  extrac+ng	
  hidden	
  or	
  not	
  so	
  obvious	
  
informa+on.	
  For	
  example,	
  “Find	
  all	
  credit	
  card	
  
customers	
  who	
  are	
  likely	
  to	
  declare	
  
bankruptcy.”	
  
©IKSINC
DBMS,	
  OLAP,	
  and	
  Data	
  Mining	
  	
  
DBMS OLAP Data Mining
Task
Extraction of detailed
and summary data
Summaries, trends and
forecasts
Knowledge
discovery of
hidden patterns
and insights
Type of result Information Analysis
Insight and
Prediction
Method
Deduction (Ask the
question, verify
with data)
Multidimensional data
modeling,
Aggregation,
Statistics
Induction (Build the
model, apply it
to new data, get
the result)
Example question
Who purchased
mutual funds in
the last 3 years?
What is the average
income of mutual
fund buyers by
region by year?
Who will buy a
mutual fund in
the next 6
months and
why?
Data	
  Mining	
  and	
  SQL	
  
•  SQL	
  is	
  good	
  for	
  
queries	
  that	
  impose	
  a	
  
constraint	
  on	
  data	
  to	
  
extract	
  an	
  answer	
  
•  SQL	
  extracts	
  shallow	
  
knowledge	
  from	
  a	
  
database	
  
•  Use	
  SQL	
  when	
  we	
  
know	
  exactly	
  what	
  we	
  
are	
  looking	
  for	
  
•  Data	
  mining	
  is	
  good	
  
for	
  exploratory	
  
queries	
  
•  Data	
  mining	
  extracts	
  
hidden	
  informa+on	
  
from	
  a	
  database	
  
•  Use	
  data	
  mining	
  when	
  
we	
  are	
  in	
  a	
  fishing	
  
mode	
  
©IKSINC
Data	
  Mining	
  and	
  Data	
  Warehousing:	
  A	
  
Mutually	
  Reinforcing	
  Relationship	
  
	
  	
  
•  Data	
  mining	
  provides	
  a	
  good	
  ROI	
  for	
  data	
  warehousing	
  
•  A	
  data	
  warehouse	
  or	
  data	
  mart	
  provides	
  clean,	
  well-­‐
formaDed	
  historical	
  data	
  for	
  mining	
  
©IKSINC
Data	
  Mining	
  Process	
  
©IKSINC
Domain
Understanding
Data
Selection
Cleaning and
Preprocessing
Discovering
Patterns
Reporting Interpretation
Domain	
  Understanding	
  Stage	
  
•  Learning	
  the	
  business	
  goals	
  
•  Gathering	
  relevant	
  prior	
  knowledge	
  
•  Best	
  executed	
  by	
  a	
  team	
  of	
  business	
  and	
  
IT	
  persons	
  
•  Good	
  understanding	
  of	
  the	
  domain	
  avoids	
  
discovering	
  irrelevant	
  paDerns	
  or	
  
minimizes	
  the	
  chances	
  for	
  garbage	
  in	
  
garbage	
  out	
  
©IKSINC
Example	
  Data	
  Sources	
  
•  Point-­‐of-­‐sale	
  data	
  
•  Credit	
  card	
  charge	
  
records	
  
•  Warranty	
  claims	
  
•  Medical	
  insurance	
  
claims	
  
•  Direct	
  mail	
  
response	
  data	
  
•  Telephone	
  call	
  
records	
  
•  Web	
  ac+vity	
  data	
  
•  Economic	
  data	
  
•  U+lity	
  charges	
  
•  Census	
  returns	
  
•  Magazine	
  
subscrip+on	
  records	
  
©IKSINC
Data	
  Cleaning	
  and	
  Preprocessing	
  Stage	
  
•  Data	
  comes	
  from	
  many	
  sources	
  -­‐	
  internal	
  and	
  external	
  
•  Data	
  comes	
  in	
  many	
  forms	
  and	
  formats	
  
•  Hierarchical	
  databases,	
  flat	
  files,	
  COBOL	
  data	
  sets	
  
•  Data	
  is	
  never	
  clean	
  
•  Most	
  important	
  stage.	
  Typically	
  consumes	
  about	
  60-­‐80%	
  
of	
  the	
  total	
  data	
  mining	
  effort	
  
©IKSINC
Business	
  Data	
  Corruption	
  Examples	
  
•  Duplica+on	
  -­‐	
  A	
  common	
  problem	
  with	
  direct	
  
mailers	
  and	
  credit	
  card	
  companies	
  
•  Missing	
  and	
  Confusing	
  Data	
  Fields	
  
•  Outliers	
  -­‐	
  Generally	
  present	
  due	
  to	
  incorrect	
  
entry/coding	
  of	
  a	
  data	
  field.	
  
©IKSINC
Data	
  Preprocessing	
  
	
  This	
  step	
  is	
  also	
  known	
  as	
  data	
  transforma4on.	
  The	
  aim	
  here	
  
is	
  to	
  map	
  data	
  fields	
  into	
  representa+ons	
  suitable	
  for	
  the	
  data	
  
discovery	
  stage.	
  
	
  Examples:	
  	
  
	
  Month/Date/Year	
  	
  	
  ===>	
  Age	
  Groups	
  
	
  Customer	
  Address	
  	
  ===>	
  Geographic	
  Zone	
  Code	
  
©IKSINC
Pattern	
  Discovery	
  Stage	
  
•  Discovery	
  Model?	
  
•  Discovery	
  Methodology?	
  
©IKSINC
Discovery	
  Models	
  
•  Associa+on	
  Model	
  
•  Classifica+on	
  Model	
  
•  Clustering	
  Model	
  
•  Regression	
  Model	
  
•  Sequen+al	
  Model	
  
•  Visual	
  Model	
  
©IKSINC
Association	
  Model	
  
	
  90%	
  of	
  customers	
  who	
  subscribe	
  to	
  at	
  least	
  
three	
  premium	
  channels	
  also	
  subscribe	
  to	
  pay-­‐
per	
  view	
  events	
  
©IKSINC
Also known as
Market Basket
Analysis
Association	
  Model:	
  Application	
  1	
  
•  Marke+ng	
  and	
  Sales	
  Promo+on:	
  
•  Let	
  the	
  rule	
  discovered	
  be	
  	
  
	
   	
   	
   	
  {Bagels,	
  …	
  }	
  -­‐-­‐>	
  {Potato	
  Chips}	
  
•  Potato	
  Chips	
  as	
  consequent	
  =>	
  Can	
  be	
  used	
  to	
  determine	
  
what	
  should	
  be	
  done	
  to	
  boost	
  its	
  sales.	
  
•  Bagels	
  in	
  the	
  antecedent	
  =>	
  Can	
  be	
  used	
  to	
  see	
  which	
  
products	
  would	
  be	
  affected	
  if	
  the	
  store	
  discon+nues	
  selling	
  
bagels.	
  
•  Bagels	
  in	
  antecedent	
  and	
  Potato	
  chips	
  in	
  consequent	
  =>	
  
Can	
  be	
  used	
  to	
  see	
  what	
  products	
  should	
  be	
  sold	
  with	
  
Bagels	
  to	
  promote	
  sale	
  of	
  Potato	
  chips!	
  
Association	
  Model:	
  Application	
  2	
  
•  Supermarket	
  shelf	
  management.	
  
•  Goal:	
  To	
  iden+fy	
  items	
  that	
  are	
  bought	
  together	
  by	
  sufficiently	
  
many	
  customers.	
  
•  Approach:	
  Process	
  the	
  point-­‐of-­‐sale	
  data	
  collected	
  with	
  barcode	
  
scanners	
  to	
  find	
  dependencies	
  among	
  items.	
  
•  A	
  classic	
  rule	
  -­‐-­‐	
  
•  If	
  a	
  customer	
  buys	
  diaper	
  and	
  milk,	
  then	
  he	
  is	
  very	
  likely	
  to	
  buy	
  beer.	
  
•  So,	
  don’t	
  be	
  surprised	
  if	
  you	
  find	
  six-­‐packs	
  stacked	
  next	
  to	
  diapers!	
  
ClassiDication	
  Model	
  
•  If	
  Annual_Income	
  >40,000	
  AND	
  Home-­‐Owner,	
  Then	
  
Credit-­‐Risk	
  è	
  Medium	
  
•  If	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
   	
   	
   	
  >	
  25,	
  	
  
	
  Then	
  Loan-­‐Approval	
  è	
  Yes	
  
©IKSINC
( _ )
( _ _ )
.
.
Annual Income
Avg Monthly CreditCardBalance Mortgage
1 2
15
+
ClassiDication:	
  Application	
  1	
  
•  Direct	
  Marke+ng	
  
•  Goal:	
  Reduce	
  cost	
  of	
  mailing	
  by	
  targe4ng	
  a	
  set	
  of	
  
consumers	
  likely	
  to	
  buy	
  a	
  new	
  cell-­‐phone	
  product.	
  
•  Approach:	
  
•  Use	
  the	
  data	
  for	
  a	
  similar	
  product	
  introduced	
  before.	
  	
  
•  We	
  know	
  which	
  customers	
  decided	
  to	
  buy	
  and	
  which	
  decided	
  
otherwise.	
  This	
  {buy,	
  don’t	
  buy}	
  decision	
  forms	
  the	
  class	
  aQribute.	
  
•  Collect	
  various	
  demographic,	
  lifestyle,	
  and	
  company-­‐interac+on	
  
related	
  informa+on	
  about	
  all	
  such	
  customers.	
  
•  Type	
  of	
  business,	
  where	
  they	
  stay,	
  how	
  much	
  they	
  earn,	
  etc.	
  
•  Use	
  this	
  informa+on	
  as	
  input	
  aDributes	
  to	
  learn	
  a	
  classifier	
  model.	
  
ClassiDication:	
  Application	
  2	
  
•  Fraud	
  Detec+on	
  
•  Goal:	
  Predict	
  fraudulent	
  cases	
  in	
  credit	
  card	
  transac+ons.	
  
•  Approach:	
  
•  Use	
  credit	
  card	
  transac+ons	
  and	
  the	
  informa+on	
  on	
  its	
  account-­‐
holder	
  as	
  aDributes.	
  
•  When	
  does	
  a	
  customer	
  buy,	
  what	
  does	
  he	
  buy,	
  how	
  oqen	
  he	
  pays	
  on	
  
+me,	
  etc	
  
•  Label	
  past	
  transac+ons	
  as	
  fraud	
  or	
  fair	
  transac+ons.	
  This	
  forms	
  the	
  
class	
  aDribute.	
  
•  Learn	
  a	
  model	
  for	
  the	
  class	
  of	
  the	
  transac+ons.	
  
•  Use	
  this	
  model	
  to	
  detect	
  fraud	
  by	
  observing	
  credit	
  card	
  
transac+ons	
  on	
  an	
  account.	
  
ClassiDication:	
  Application	
  3	
  
•  Customer	
  ADri+on/Churn:	
  
•  Goal:	
  To	
  predict	
  whether	
  a	
  customer	
  is	
  
likely	
  to	
  be	
  lost	
  to	
  a	
  compe+tor.	
  
•  Approach:	
  
•  Use	
  detailed	
  record	
  of	
  transac+ons	
  with	
  
each	
  of	
  the	
  past	
  and	
  present	
  customers,	
  to	
  
find	
  aDributes.	
  
•  How	
  oqen	
  the	
  customer	
  calls,	
  where	
  he	
  calls,	
  
what	
  +me-­‐of-­‐the	
  day	
  he	
  calls	
  most,	
  his	
  
financial	
  status,	
  marital	
  status,	
  etc.	
  	
  
•  Label	
  the	
  customers	
  as	
  loyal	
  or	
  disloyal.	
  
•  Find	
  a	
  model	
  for	
  loyalty.	
  
Clustering	
  Model	
  
	
  Clustering	
  models	
  are	
  similar	
  to	
  
classifica+on	
  models	
  except	
  that	
  no	
  a-­‐
priori	
  informa+on	
  is	
  available	
  for	
  
classes.	
  
©IKSINC
Clustering:	
  Application	
  1	
  
•  Market	
  Segmenta+on:	
  
•  Goal:	
  subdivide	
  a	
  market	
  into	
  dis+nct	
  
subsets	
  of	
  customers	
  where	
  any	
  subset	
  
may	
  conceivably	
  be	
  selected	
  as	
  a	
  market	
  
target	
  to	
  be	
  reached	
  with	
  a	
  dis+nct	
  
marke+ng	
  mix.	
  
•  Approach:	
  	
  
•  Collect	
  different	
  aDributes	
  of	
  customers	
  
based	
  on	
  their	
  geographical	
  and	
  lifestyle	
  
related	
  informa+on.	
  
•  Find	
  clusters	
  of	
  similar	
  customers.	
  
•  Measure	
  the	
  clustering	
  quality	
  by	
  observing	
  
buying	
  paDerns	
  of	
  customers	
  in	
  same	
  cluster	
  
vs.	
  those	
  from	
  different	
  clusters.	
  	
  
Clustering:	
  Application	
  2	
  
•  Document	
  Clustering:	
  
•  Goal:	
  To	
  find	
  groups	
  of	
  documents	
  that	
  are	
  
similar	
  to	
  each	
  other	
  based	
  on	
  the	
  important	
  
terms	
  appearing	
  in	
  them.	
  
•  Approach:	
  To	
  iden+fy	
  frequently	
  occurring	
  
terms	
  in	
  each	
  document.	
  Form	
  a	
  similarity	
  
measure	
  based	
  on	
  the	
  frequencies	
  of	
  different	
  
terms.	
  Use	
  it	
  to	
  cluster.	
  
•  Gain:	
  Informa+on	
  Retrieval	
  can	
  u+lize	
  the	
  
clusters	
  to	
  relate	
  a	
  new	
  document	
  or	
  search	
  
term	
  to	
  clustered	
  documents.	
  
Regression	
  Model	
  
	
  Log(Peak_Load)	
  =	
  300	
  +	
  1.6(|Temp	
  -­‐	
  65|)**2	
  
	
   	
   	
   	
  	
  	
  	
  	
  +	
  2.4(Rel_Humidity	
  -­‐	
  80)	
  
	
  
	
  Unlike	
  classifica+on	
  models	
  that	
  produce	
  only	
  
discrete	
  outcomes,	
  a	
  regression	
  model	
  
generates	
  a	
  numerical	
  score	
  as	
  its	
  output.	
  
©IKSINC
Sequential	
  Model	
  
	
  Similar	
  to	
  associa+on	
  models	
  except	
  that	
  sequences	
  of	
  events	
  
are	
  considered.	
  For	
  example:	
  
	
  “80%	
  of	
  customers	
  who	
  buy	
  a	
  product	
  X	
  are	
  likely	
  to	
  buy	
  
product	
  Y	
  in	
  next	
  six	
  months”	
  
©IKSINC
Visual	
  Model	
  
©IKSINC
©IKSINC
Geographical Patterns and Map Visualization
Interpretation	
  Stage	
  
•  Evaluate	
  the	
  quality	
  of	
  the	
  
discovered	
  paDerns	
  
•  Determine	
  the	
  value	
  of	
  the	
  
discovery	
  to	
  the	
  business	
  
©IKSINC
Value	
  of	
  Mined	
  Information	
  
•  Percep+ve	
  gap	
  
©IKSINC
The
Organization
Organization’s
View of the
Business
Data
Mined
View of
the
Business
Gap
Value	
  of	
  Mined	
  Information	
  
•  Dollar	
  gap	
  
©IKSINC
Gap
The
Organization
Organization’s
View of the
Business
Data
Mined
View of
the
Business
Action 1
Action 2
Reporting	
  Stage	
  
•  Repor+ng	
  the	
  discovery	
  to	
  higher	
  management	
  
•  Transforming	
  the	
  discovery	
  to	
  new	
  ac+ons	
  or	
  
products	
  
©IKSINC
Challenges	
  of	
  Data	
  Mining	
  
•  Scalability	
  
•  Dimensionality	
  
•  Complex	
  and	
  Heterogeneous	
  Data	
  
•  Data	
  Quality	
  
•  Data	
  Ownership	
  and	
  Distribu+on	
  
•  Privacy	
  Preserva+on	
  
©IKSINC
Thank	
  you	
  for	
  Viewing	
  my	
  Presenta+on	
  
	
  
For	
  ques+ons,	
  you	
  can	
  contact	
  me	
  at	
  iksinc@yahoo.com	
  
	
  
Also	
  visit	
  my	
  blog	
  “From	
  Data	
  to	
  Decisions”	
  at	
  
iksinc.wordpress.com	
  
	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an IntroductionAli Abbasi
 
Data Mining : Concepts
Data Mining : ConceptsData Mining : Concepts
Data Mining : ConceptsPragya Pandey
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science ProcessVishal Patel
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data MiningAmritanshu Mehra
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project LifecycleJason Geng
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycleManoj Mishra
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...Edureka!
 

Was ist angesagt? (20)

Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
 
Data Mining : Concepts
Data Mining : ConceptsData Mining : Concepts
Data Mining : Concepts
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
 
Data Mining
Data MiningData Mining
Data Mining
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
Data Mining
Data MiningData Mining
Data Mining
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project Lifecycle
 
Data science life cycle
Data science life cycleData science life cycle
Data science life cycle
 
Lecture 01 Data Mining
Lecture 01 Data MiningLecture 01 Data Mining
Lecture 01 Data Mining
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 

Ähnlich wie Introduction to Data Mining

The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)Denodo
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataUmair Shafique
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and InnovationCaserta
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?Findwise
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptxinfinix8
 
How to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT OperationsHow to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT OperationsExtraHop Networks
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 
How to build a successful data lake Presentation.pptx
How to build a successful data lake Presentation.pptxHow to build a successful data lake Presentation.pptx
How to build a successful data lake Presentation.pptxTarekHassan840678
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesDATAVERSITY
 
bigdata (1)
bigdata (1)bigdata (1)
bigdata (1)DIVYA G
 
Accelerate Cloud Migrations and Architecture with Data Virtualization
Accelerate Cloud Migrations and Architecture with Data VirtualizationAccelerate Cloud Migrations and Architecture with Data Virtualization
Accelerate Cloud Migrations and Architecture with Data VirtualizationDenodo
 
Future of Making Things
Future of Making ThingsFuture of Making Things
Future of Making ThingsJC Davis
 
ERP technology Areas.pptx
ERP technology Areas.pptxERP technology Areas.pptx
ERP technology Areas.pptxssuserdd904d
 
Information and data relevance to business
Information and data relevance to businessInformation and data relevance to business
Information and data relevance to businessiasaglobal
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and InnovationCaserta
 

Ähnlich wie Introduction to Data Mining (20)

The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Big data
Big dataBig data
Big data
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
 
How to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT OperationsHow to Use Big Data to Transform IT Operations
How to Use Big Data to Transform IT Operations
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
How to build a successful data lake Presentation.pptx
How to build a successful data lake Presentation.pptxHow to build a successful data lake Presentation.pptx
How to build a successful data lake Presentation.pptx
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use Cases
 
bigdata (1)
bigdata (1)bigdata (1)
bigdata (1)
 
Accelerate Cloud Migrations and Architecture with Data Virtualization
Accelerate Cloud Migrations and Architecture with Data VirtualizationAccelerate Cloud Migrations and Architecture with Data Virtualization
Accelerate Cloud Migrations and Architecture with Data Virtualization
 
Future of Making Things
Future of Making ThingsFuture of Making Things
Future of Making Things
 
ERP technology Areas.pptx
ERP technology Areas.pptxERP technology Areas.pptx
ERP technology Areas.pptx
 
Information and data relevance to business
Information and data relevance to businessInformation and data relevance to business
Information and data relevance to business
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 
Data mining
Data miningData mining
Data mining
 

Mehr von Si Krishan

Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroSi Krishan
 
Bringing AI to Business Intelligence
Bringing AI to Business IntelligenceBringing AI to Business Intelligence
Bringing AI to Business IntelligenceSi Krishan
 
Intro to Excel Basics: Part II
Intro to Excel Basics: Part IIIntro to Excel Basics: Part II
Intro to Excel Basics: Part IISi Krishan
 
Intro to Excel Basics: Part I
Intro to Excel Basics: Part IIntro to Excel Basics: Part I
Intro to Excel Basics: Part ISi Krishan
 
Image Search: Then and Now
Image Search: Then and NowImage Search: Then and Now
Image Search: Then and NowSi Krishan
 
How to Do Research: Seven Steps to Successful Research
How to Do Research: Seven Steps to Successful ResearchHow to Do Research: Seven Steps to Successful Research
How to Do Research: Seven Steps to Successful ResearchSi Krishan
 
Machine learning and multimedia information retrieval
Machine learning and multimedia information retrievalMachine learning and multimedia information retrieval
Machine learning and multimedia information retrievalSi Krishan
 
Social media in banking
Social media in bankingSocial media in banking
Social media in bankingSi Krishan
 

Mehr von Si Krishan (9)

Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An Intro
 
Bringing AI to Business Intelligence
Bringing AI to Business IntelligenceBringing AI to Business Intelligence
Bringing AI to Business Intelligence
 
Ml intro
Ml introMl intro
Ml intro
 
Intro to Excel Basics: Part II
Intro to Excel Basics: Part IIIntro to Excel Basics: Part II
Intro to Excel Basics: Part II
 
Intro to Excel Basics: Part I
Intro to Excel Basics: Part IIntro to Excel Basics: Part I
Intro to Excel Basics: Part I
 
Image Search: Then and Now
Image Search: Then and NowImage Search: Then and Now
Image Search: Then and Now
 
How to Do Research: Seven Steps to Successful Research
How to Do Research: Seven Steps to Successful ResearchHow to Do Research: Seven Steps to Successful Research
How to Do Research: Seven Steps to Successful Research
 
Machine learning and multimedia information retrieval
Machine learning and multimedia information retrievalMachine learning and multimedia information retrieval
Machine learning and multimedia information retrieval
 
Social media in banking
Social media in bankingSocial media in banking
Social media in banking
 

Kürzlich hochgeladen

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 

Kürzlich hochgeladen (20)

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 

Introduction to Data Mining

  • 1. DM  Intro   Integrated  Knowledge  Solutions   iksinc@yahoo.com   iksinc.wordpress.com  
  • 2. What  is  Data?   ©IKSINC Data is a set of facts/observations/ measurements about objects/ events/processes of interest
  • 3. What  is  Information?   ©IKSINC Information is processed data that is useful in one way or the other, for example for decision making, communication etc. While the data is fixed, information from it can differ based on needs
  • 4. What  is  Knowledge?   ©IKSINC Patterns of relationships in data and information that exhibit a high degree of certainty
  • 5. What  Is  Data  Mining?    Data  mining  is  essen+ally  a  process  of  data-­‐driven  extrac+on   of  not  so  obvious  but  useful  informa+on  from  large  databases.   The  en+re  process  is  interac+ve  and  itera+ve.                Data  mining  also  goes  under  various  other  names  such  as:    Knowledge  discovery  in  databases  (KDD),  knowledge   extrac+on,  data/paDern  analysis,  data  analy+cs,  business   intelligence,  etc.   ©IKSINC
  • 7. Why  Data  Mining?   •  The  Data  Glut   •  Data  rich  but  informa+on   poor  businesses   •  Es+mates  of  data  doubling   every  20  months   •  The  average  Fortune  500   company  manages  over  a   terabyte  of  data  everyday   •  Convergence  of   Technologies   •  Compe++ve  Edge   •  Mass  marke+ng  versus   targeted  marke+ng   ©IKSINC
  • 8.
  • 11. Typical  Business  Applications  of  Data   Mining   •  Market   Segmenta+on   •  Customer  Targe+ng   and  Reten+on   •  Product  Design  and   Placement   •  Credit  Card  Fraud   Detec+on   •  Web  Adver+sing   •  Recommenda+on   Systems   ©IKSINC
  • 12. Other  Applications  of  Data  Mining   •  Stock  Market  Trends   •  Text  and  Mul+media  Data   Mining   •  Sports  Scou+ng   •  Medical  Outcomes  Analysis   •  Scien+fic  Data  Mining       ©IKSINC
  • 13. Data  ClassiDication   •  Structured  Data   •  Data  consis+ng  of  well-­‐defined  fields   of  numeric  or  alphanumeric  values   ©IKSINC
  • 14. Data  ClassiDication   •  Unstructured  Data   •  No  well  defined  fields  of   informa+on   •  Requires  extensive   processing  to  extract   content  informa+on   •  Examples  include  blogs,   news  reports,  images,   videos,  tweets  etc.   •  Fastest  growing  data   segment   ©IKSINC
  • 15. Data  ClassiDication   •  Semi-­‐Structured  Data   •  Data  with  par+al  structure  (medical  reports,   execu+ve  summaries,  interview  scripts,  web   documents  etc.)   ©IKSINC
  • 16. Core  Technologies  for  Data  Mining   Data Mining Machine Learning Database Technology Statistics Visualization Information Retrieval
  • 17. ©IKSINC Data Mining and Business Intelligence Increasing potential to support business decisions End User Business Analyst Data Analyst DBA Making Decisions Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration OLAP, MDA Statistical Analysis, Querying and Reporting Data Warehouses / Data Marts Data Sources Paper, Files, Information Providers, Database Systems, OLTP
  • 18. ©IKSINC Architecture of a Typical Data Mining System Data Warehouse Data cleaning & data integration Filtering Databases Database or data warehouse server Data mining engine Pattern evaluation Graphical user interface Knowledge-base
  • 19. Data  Mining  and  Data  Warehousing   ©IKSINC Customers Etc… Vendors Etc… Orders Data Warehouse Enterprise “Database” Transactions Copied, organized summarized Data Mining Data Miners A data warehouse is a data repository set up to support strategic decision making.
  • 20. Data  Mart   •  A  Data  Mart  is  a  smaller,  more  focused  Data  Warehouse  –  a   mini-­‐warehouse.   •  A  Data  Mart  typically  reflects  the  business  rules  of  a  specific   business  unit  within  an  enterprise.  
  • 21. Accessing  Information  in  a  Data   Warehouse   •  Structured  Query  Language  (SQL)-­‐based   Repor+ng  and  Query  Tools   •  Good  for  extrac+ng  shallow,  non-­‐dimensional   informa+on.  For  example,  “Find  all  credit  card   customers  holding  a  balance  payment  of  $500   or  more.”   ©IKSINC
  • 22. Accessing  Information  in  a  Data   Warehouse   •  On-­‐Line  Analy+cal  Processing  (OLAP)-­‐based   Repor+ng  and  Query  Tools   •  Good  for  extrac+ng  shallow,  dimensional   informa+on.  For  example,  “Find  all  credit  card   customers  who  live  in  Midwest,  drive  a  luxury  car,   and  hold  a  balance  payment  of  $500  or  more.”   ©IKSINC
  • 23. On-­‐Line  Analytical  Processing  (OLAP)   •  OLAP  tools  organize  data  in  a  dimensional   representa+on,  called    a  cube.  This  permits  data   to  be  viewed  from  any  user  specified  angle  to   allow  slicing-­‐and-­‐dicing  of  the  data.     •  SQL  can  easily  answer  ‘who?’  and  ‘what?’   ques+ons,  however,  ability  to  answer  ‘what  if?’   and  ‘why?’  type  ques+ons  dis+nguishes  OLAP   from  general-­‐purpose  query  tools.       ©IKSINC
  • 25. 25 OLAP  Operations   Single Cell Multiple Cells Slice Dice Roll Up Drill Down
  • 26. Different  OLAP  Tools   •  Desktop  OLAP  (DOLAP)   •  Limited  capability  PC-­‐based  tools   •  Rela+onal  OLAP  (ROLAP)   •  Server-­‐based  tools  that  let  a  user  assemble  into  a  cube   a  subset  of  data  from  a  rela+onal  database   •  Mul+dimensional  OLAP  (MOLAP)   •  Server-­‐based  tools  that  use  pre-­‐computed  cubes  of   data  for  faster  response   ©IKSINC
  • 28. Accessing  Information  in  a  Data  Warehouse   •  Data  Mining  Tools   •  Good  for  extrac+ng  hidden  or  not  so  obvious   informa+on.  For  example,  “Find  all  credit  card   customers  who  are  likely  to  declare   bankruptcy.”   ©IKSINC
  • 29. DBMS,  OLAP,  and  Data  Mining     DBMS OLAP Data Mining Task Extraction of detailed and summary data Summaries, trends and forecasts Knowledge discovery of hidden patterns and insights Type of result Information Analysis Insight and Prediction Method Deduction (Ask the question, verify with data) Multidimensional data modeling, Aggregation, Statistics Induction (Build the model, apply it to new data, get the result) Example question Who purchased mutual funds in the last 3 years? What is the average income of mutual fund buyers by region by year? Who will buy a mutual fund in the next 6 months and why?
  • 30. Data  Mining  and  SQL   •  SQL  is  good  for   queries  that  impose  a   constraint  on  data  to   extract  an  answer   •  SQL  extracts  shallow   knowledge  from  a   database   •  Use  SQL  when  we   know  exactly  what  we   are  looking  for   •  Data  mining  is  good   for  exploratory   queries   •  Data  mining  extracts   hidden  informa+on   from  a  database   •  Use  data  mining  when   we  are  in  a  fishing   mode   ©IKSINC
  • 31. Data  Mining  and  Data  Warehousing:  A   Mutually  Reinforcing  Relationship       •  Data  mining  provides  a  good  ROI  for  data  warehousing   •  A  data  warehouse  or  data  mart  provides  clean,  well-­‐ formaDed  historical  data  for  mining   ©IKSINC
  • 32. Data  Mining  Process   ©IKSINC Domain Understanding Data Selection Cleaning and Preprocessing Discovering Patterns Reporting Interpretation
  • 33.
  • 34. Domain  Understanding  Stage   •  Learning  the  business  goals   •  Gathering  relevant  prior  knowledge   •  Best  executed  by  a  team  of  business  and   IT  persons   •  Good  understanding  of  the  domain  avoids   discovering  irrelevant  paDerns  or   minimizes  the  chances  for  garbage  in   garbage  out   ©IKSINC
  • 35. Example  Data  Sources   •  Point-­‐of-­‐sale  data   •  Credit  card  charge   records   •  Warranty  claims   •  Medical  insurance   claims   •  Direct  mail   response  data   •  Telephone  call   records   •  Web  ac+vity  data   •  Economic  data   •  U+lity  charges   •  Census  returns   •  Magazine   subscrip+on  records   ©IKSINC
  • 36. Data  Cleaning  and  Preprocessing  Stage   •  Data  comes  from  many  sources  -­‐  internal  and  external   •  Data  comes  in  many  forms  and  formats   •  Hierarchical  databases,  flat  files,  COBOL  data  sets   •  Data  is  never  clean   •  Most  important  stage.  Typically  consumes  about  60-­‐80%   of  the  total  data  mining  effort   ©IKSINC
  • 37. Business  Data  Corruption  Examples   •  Duplica+on  -­‐  A  common  problem  with  direct   mailers  and  credit  card  companies   •  Missing  and  Confusing  Data  Fields   •  Outliers  -­‐  Generally  present  due  to  incorrect   entry/coding  of  a  data  field.   ©IKSINC
  • 38. Data  Preprocessing    This  step  is  also  known  as  data  transforma4on.  The  aim  here   is  to  map  data  fields  into  representa+ons  suitable  for  the  data   discovery  stage.    Examples:      Month/Date/Year      ===>  Age  Groups    Customer  Address    ===>  Geographic  Zone  Code   ©IKSINC
  • 39. Pattern  Discovery  Stage   •  Discovery  Model?   •  Discovery  Methodology?   ©IKSINC
  • 40. Discovery  Models   •  Associa+on  Model   •  Classifica+on  Model   •  Clustering  Model   •  Regression  Model   •  Sequen+al  Model   •  Visual  Model   ©IKSINC
  • 41. Association  Model    90%  of  customers  who  subscribe  to  at  least   three  premium  channels  also  subscribe  to  pay-­‐ per  view  events   ©IKSINC Also known as Market Basket Analysis
  • 42. Association  Model:  Application  1   •  Marke+ng  and  Sales  Promo+on:   •  Let  the  rule  discovered  be            {Bagels,  …  }  -­‐-­‐>  {Potato  Chips}   •  Potato  Chips  as  consequent  =>  Can  be  used  to  determine   what  should  be  done  to  boost  its  sales.   •  Bagels  in  the  antecedent  =>  Can  be  used  to  see  which   products  would  be  affected  if  the  store  discon+nues  selling   bagels.   •  Bagels  in  antecedent  and  Potato  chips  in  consequent  =>   Can  be  used  to  see  what  products  should  be  sold  with   Bagels  to  promote  sale  of  Potato  chips!  
  • 43. Association  Model:  Application  2   •  Supermarket  shelf  management.   •  Goal:  To  iden+fy  items  that  are  bought  together  by  sufficiently   many  customers.   •  Approach:  Process  the  point-­‐of-­‐sale  data  collected  with  barcode   scanners  to  find  dependencies  among  items.   •  A  classic  rule  -­‐-­‐   •  If  a  customer  buys  diaper  and  milk,  then  he  is  very  likely  to  buy  beer.   •  So,  don’t  be  surprised  if  you  find  six-­‐packs  stacked  next  to  diapers!  
  • 44. ClassiDication  Model   •  If  Annual_Income  >40,000  AND  Home-­‐Owner,  Then   Credit-­‐Risk  è  Medium   •  If                                                                                        >  25,      Then  Loan-­‐Approval  è  Yes   ©IKSINC ( _ ) ( _ _ ) . . Annual Income Avg Monthly CreditCardBalance Mortgage 1 2 15 +
  • 45. ClassiDication:  Application  1   •  Direct  Marke+ng   •  Goal:  Reduce  cost  of  mailing  by  targe4ng  a  set  of   consumers  likely  to  buy  a  new  cell-­‐phone  product.   •  Approach:   •  Use  the  data  for  a  similar  product  introduced  before.     •  We  know  which  customers  decided  to  buy  and  which  decided   otherwise.  This  {buy,  don’t  buy}  decision  forms  the  class  aQribute.   •  Collect  various  demographic,  lifestyle,  and  company-­‐interac+on   related  informa+on  about  all  such  customers.   •  Type  of  business,  where  they  stay,  how  much  they  earn,  etc.   •  Use  this  informa+on  as  input  aDributes  to  learn  a  classifier  model.  
  • 46. ClassiDication:  Application  2   •  Fraud  Detec+on   •  Goal:  Predict  fraudulent  cases  in  credit  card  transac+ons.   •  Approach:   •  Use  credit  card  transac+ons  and  the  informa+on  on  its  account-­‐ holder  as  aDributes.   •  When  does  a  customer  buy,  what  does  he  buy,  how  oqen  he  pays  on   +me,  etc   •  Label  past  transac+ons  as  fraud  or  fair  transac+ons.  This  forms  the   class  aDribute.   •  Learn  a  model  for  the  class  of  the  transac+ons.   •  Use  this  model  to  detect  fraud  by  observing  credit  card   transac+ons  on  an  account.  
  • 47. ClassiDication:  Application  3   •  Customer  ADri+on/Churn:   •  Goal:  To  predict  whether  a  customer  is   likely  to  be  lost  to  a  compe+tor.   •  Approach:   •  Use  detailed  record  of  transac+ons  with   each  of  the  past  and  present  customers,  to   find  aDributes.   •  How  oqen  the  customer  calls,  where  he  calls,   what  +me-­‐of-­‐the  day  he  calls  most,  his   financial  status,  marital  status,  etc.     •  Label  the  customers  as  loyal  or  disloyal.   •  Find  a  model  for  loyalty.  
  • 48. Clustering  Model    Clustering  models  are  similar  to   classifica+on  models  except  that  no  a-­‐ priori  informa+on  is  available  for   classes.   ©IKSINC
  • 49. Clustering:  Application  1   •  Market  Segmenta+on:   •  Goal:  subdivide  a  market  into  dis+nct   subsets  of  customers  where  any  subset   may  conceivably  be  selected  as  a  market   target  to  be  reached  with  a  dis+nct   marke+ng  mix.   •  Approach:     •  Collect  different  aDributes  of  customers   based  on  their  geographical  and  lifestyle   related  informa+on.   •  Find  clusters  of  similar  customers.   •  Measure  the  clustering  quality  by  observing   buying  paDerns  of  customers  in  same  cluster   vs.  those  from  different  clusters.    
  • 50. Clustering:  Application  2   •  Document  Clustering:   •  Goal:  To  find  groups  of  documents  that  are   similar  to  each  other  based  on  the  important   terms  appearing  in  them.   •  Approach:  To  iden+fy  frequently  occurring   terms  in  each  document.  Form  a  similarity   measure  based  on  the  frequencies  of  different   terms.  Use  it  to  cluster.   •  Gain:  Informa+on  Retrieval  can  u+lize  the   clusters  to  relate  a  new  document  or  search   term  to  clustered  documents.  
  • 51. Regression  Model    Log(Peak_Load)  =  300  +  1.6(|Temp  -­‐  65|)**2                  +  2.4(Rel_Humidity  -­‐  80)      Unlike  classifica+on  models  that  produce  only   discrete  outcomes,  a  regression  model   generates  a  numerical  score  as  its  output.   ©IKSINC
  • 52. Sequential  Model    Similar  to  associa+on  models  except  that  sequences  of  events   are  considered.  For  example:    “80%  of  customers  who  buy  a  product  X  are  likely  to  buy   product  Y  in  next  six  months”   ©IKSINC
  • 55. Interpretation  Stage   •  Evaluate  the  quality  of  the   discovered  paDerns   •  Determine  the  value  of  the   discovery  to  the  business   ©IKSINC
  • 56. Value  of  Mined  Information   •  Percep+ve  gap   ©IKSINC The Organization Organization’s View of the Business Data Mined View of the Business Gap
  • 57. Value  of  Mined  Information   •  Dollar  gap   ©IKSINC Gap The Organization Organization’s View of the Business Data Mined View of the Business Action 1 Action 2
  • 58. Reporting  Stage   •  Repor+ng  the  discovery  to  higher  management   •  Transforming  the  discovery  to  new  ac+ons  or   products   ©IKSINC
  • 59. Challenges  of  Data  Mining   •  Scalability   •  Dimensionality   •  Complex  and  Heterogeneous  Data   •  Data  Quality   •  Data  Ownership  and  Distribu+on   •  Privacy  Preserva+on   ©IKSINC
  • 60. Thank  you  for  Viewing  my  Presenta+on     For  ques+ons,  you  can  contact  me  at  iksinc@yahoo.com     Also  visit  my  blog  “From  Data  to  Decisions”  at   iksinc.wordpress.com