2. OUTLINE
History of Belkin Theory Anomalous State of
Knowledge
Introduction
What is Anomaly
Background
Comparison of Traditional and Belkin’s Models
Applications
Implication
Conclusions
References
3. ANOMALOUS STATE OF KNOWLEDGE
INTRODUCTION
We are drowning in the overflow of data that are
being collected world-wide, while starving for
knowledge at the same time.
Anomalous events occur relatively infrequently
However, when they do occur, their consequences
can be quite dramatic and quite often in a negative
sense
4. WHAT ARE ANOMALIES?
Anomaly is a pattern in the data that does
not conform to the expected behaviour
Also referred to as outliers, exceptions,
peculiarities, surprise, etc.
5. HISTORY NICHOLAS BELKIN
• Nicholas J. Belkin is a Professor at the
school of Communication and information at
Rutgers University.
• Belkin is best know for his work on human-
centered Information Retrieval and
hypothesis of Anomalous State of
Knowledge(ASK)
• Belkin realized that in many cases, Users of
search systems are unable to precisely
formulate what they need. They miss some
vital Knowledge to formulate their queries.
6. BELKIN’S THEORY HISTORY
• In Such cases it is more suitable to attempt to
describe a user’s anomalous state of
knowledge than to ask the user to specify her
/his need as request to the system.
• Among the main themes of his research are
digital libraries; information-seeking behaviors;
and information retrieval system.
• Dr. Belkin was the chair of SIGIR in 1995-99
and the president of American Society for
Information Science and Technology (ASIS&T)
in 2005
7. BACKGROUND
Information retrieval (IR) systems as presently
designed in terms of complete recall and precision or
complete user satisfaction.
Traditional view of IR
8. THE INFORMATION RETRIEVAL CYCLE
Source
Selection
Search
Quer
y
Selection
Ranked List
Examination
Documents
Delivery
Documents
Query
Formulation
Resource
source reselection
System discovery
Vocabulary discovery
Concept discovery
Document discovery
10. ANOMALOUS STATE OF KNOWLEDGE
Basic paradox:
Information needs arise because the user doesn’t know
something: “an anomaly in his state of knowledge with
respect to the problem faced”
Search systems are designed to satisfy these needs,
but the user needs to know what he is looking for
However, if the user knows what he’s looking for, there
may not be a need to search in the first place
Implication: computing “similarity” between queries
and documents is fundamentally wrong
How do we resolve this paradox?
11. APPLICATIONS OF ANOMALY DETECTION
Network intrusion detection
Insurance / Credit card fraud detection
Healthcare Informatics / Medical diagnostics
Industrial Damage Detection
Image Processing / Video surveillance
Novel Topic Detection in Text Mining
12. This new approach recognizes that a fundamental
element in the IR situation is the development of an
information need out of an inadequate state of
knowledge.
Appropriate representation is consideration of the
information need as an 'anomalous state of knowledge'
(ASK).6,9
ANOMLOUS STATE OF KNOWLEDGE
‘’The ASK hypothesis is that an information need
arises from a recognized anomaly in the user's state
of knowledge concerning some topic or situation and
that, in general, the user is unable to specify precisely
what is needed to resolve that anomaly’’.
13. IMPLICATIONS
The typical IR system now available, either
operational or experimental, depends on what we
call the 'best-match' principle.
ASK = Non-Specifiability of need (Cognitive or
Linguistic)
Cognitive Non- Specifiability
Linguistic Non-Specifiability
14. OTHER THEORIES RELATED
Unconscious Need by Robert S.Taylor (1968)
Problematic Situation by Wersig (1971)
Gaps by Dervin (1983)
15. RESEARCH METHODOLOGY
Tape recording a number of interviews with
users of actual information systems.
Adaptation and implementation of the text
analysis program developed by Belkin so as to
produce structural representations of this data.
Obtaining the authors' /users' evaluations of
these representations, through the use of
questionnaires or interviews where appropriate.
16. CONCLUSIONS
Anomaly detection can detect critical information in
data
Highly applicable in various application domains
Nature of anomaly detection problem is dependent
on the application domain
Need different approaches to solve a particular
problem formulation
17. REFERENCES
Ling, C., Li, C. Data mining for direct marketing:
Problems and solutions, KDD, 1998.
Kubat M., Matwin, S., Addressing the Curse of
Imbalanced Training Sets: One-Sided Selection,
ICML 1997.
N. Chawla et al., SMOTE: Synthetic Minority Over-
Sampling Technique, JAIR, 2002.
W. Fan et al, Using Artificial Anomalies to Detect
Unknown and Known Network Intrusions, ICDM
2001
18. CONT.……
N. Abe, et al, Outlier Detection by Active Learning,
KDD 2006
C. Cardie, N. Howe, Improving Minority Class
Prediction Using Case specific feature weighting,
ICML 1997.
J. Grzymala et al, An Approach to Imbalanced Data
Sets Based on Changing Rule Strength, AAAI
Workshop on Learning from Imbalanced Data Sets,
2000.
George H. John. Robust linear discriminant trees.
AI&Statistics, 1995
19. CONT.……
Barbara, D., Couto, J., Jajodia, S., and Wu, N.
Adam: a testbed for exploring the use of data
mining in intrusion detection. SIGMOD Rec., 2001
Otey, M., Parthasarathy, S., Ghoting, A., Li, G.,
Narravula, S., and Panda, D. Towards nic-based
intrusion detection. KDD 2003
He, Z., Xu, X., Huang, J. Z., and Deng, S. A
frequent pattern discovery method for outlier
detection. Web-Age Information Management, 726–
732, 2004
20. CONT.……
Lee, W., Stolfo, S. J., and Mok, K. W. Adaptive
intrusion detection: A data mining approach.
Artificial Intelligence Review, 2000
Qin, M. and Hwang, K. Frequent episode rules for
internet anomaly detection. In Proceedings of the
3rd IEEE International Symposium on Network
Computing and Applications, 2004
Ide, T. and Kashima, H. Eigenspace-based
anomaly detection in computer systems. KDD,
2004
Sun, J. et al., Less is more: Compact matrix
representation of large sparse graphs. ICDM 2007