The document summarizes a data mining program held at Renmin University in Beijing from May 21-27, 2012. It discusses the various lecturers and topics covered during the program. Professors Yang, Han, and Pei each gave lectures on their areas of expertise, including classification and transfer learning, information network models, and mining uncertain data. The curriculum focused mainly on data mining and included both basic and advanced concepts. Participants were encouraged to actively engage and ask questions throughout the program.
Dragon Star Program Summarization lead to innovation
1. Summarization for Dragon Star
Program
(Renmin Univ, Beijing, 5.21~5.27, 2012)
Yueshen Xu
xuyueshen@163.com
Zhejiang University
05/28/12 ZJU
2. Overview
Narration
What they addressed
Program Profile
Knowledge and Expertise
Argumentation No
What I think over Dazzle
Research and Research Mode
Potpourri
Discussion
05/28/12 ZJU
3. Organizer and Lecturer
Organizer Lecturer
• Classification • Network Model
• Transfer • Relationship
An Learning Mining over
amiable DBLP
lady
CuiPing Li Prof. Qiang Yang, HKUST Prof. Jiawei Han, UIUC
• Online Group • Mining on
Behavior over Uncertain
Social Network Data
guest
Prof. Liu Huan, Prof. Jian Pei, SFU
Jun He ASU
appearanc
e
05/28/12 ZJU
4. Curriculum
Contents
Mainly about Data Mining
A little about machine learning and database
Base + Advance
Base: All should know
Advance: Only a few know
6:30
Syllabus
Tight and tired
Participation Prof. Liu
On time, in time and full time
05/28/12 ZJU
5. Attention
• No
qualification • What you research is to what you
meet.
No comment, no guess, just what it’s what
No topics, no transformation and no speculation
• What they told me are
No detail, just summarization summarization
Further study resource repository • Digestitnot too muchit
• Learn for needing
http://www.cse.ust.hk/~qyang/2012DStar/
http://www.cs.uiuc.edu/~hanj/dragon12/info12.htm
Ask for me
Ask for me all is OK
05/28/12 ZJU
6. Prof. Yang
Classification & Transfer Learning
Classification Prof. Yang, can
Decision Trees you speak a little
Neural Networks faster?
Replaced by SVM
Bayesian Classifiers
Just Summarization,
Conditional Independence
little detail
Naïve Bayesian Network
Support Vector Machines
Little about why, mainly about what
Ensemble Classifiers
Bagging and Boost (Ada boost)
Random Forest
Collaborative Filtering
A little
05/28/12 ZJU
7. Prof. Yang
Classification & Transfer Learning
Transfer Learning
What he and his students good at and maybe only good at
05/28/12 ZJU
8. Prof. Yang
Classification & Transfer Learning
I don’t know, but I can bamboozle you
Transfer Learning
The ability of a system to recognize and apply knowledge and
skills learned in previous tasks to novel tasks or new domains
Easy to talk, hard to do
05/28/12 ZJU
9. Prof. Yang
Classification & Transfer Learning
What they focus on
Heterogeneous Transfer Learning
Source-free selection transfer learning
Multi-task transfer learning
Transfer Learning for Link Prediction
EigenTransfer: A Unified Framework for Transfer Learning
05/28/12 ZJU
10. Prof. Han
Information Network Model & Relationship Mining over DBLP
An amiable and rigorous old senior
He is involved in the whole process of each paper, ‘Cause he knows
details well
He would like to answer every questions
Never acting superior
Information Network Model:
Great powers of conception
Fundamental theory of network analysis
Not just about social network. Take a glance at Prof. Han’s contents:
─ Network Science
─ Measure of Metrics of Networks
─ Models of Network Formation
05/28/12 ZJU
11. Prof. Han
Information Network Model & Relationship Mining over DBLP
Network Science Plentiful Models of Network Formation
Social network Explain how social networks
Social network example should be organized
Friendship networks vs. blogosphere Model the graph generation
Other Network process of social networks
Communication Network Probabilistic Distribution
Power Law Long tail law
Biological Network
The Erdös-Rényi (ER) Model
The Watts and Strogatz Model
Network model and their
representation
Too many, just list some:
• PageRank, Bipartite Networks
05/28/12 ZJU
12. Prof. Han
Information Network Model & Relationship Mining over DBLP
All based on DBLP
Why? ‘Cause it’s heterogeneous networks
Clustering, Ranking in information networks
Problems What they mine
05/28/12 ZJU
13. Prof. Han
Information Network Model & Relationship Mining over DBLP
Classification of information networks
Is VLDB a conference belonging to DB or DM?
Similarity Search in information networks
DBLP
Who are the most similar to “Christos Faloutsos”?
IMDB
Which movies are the most similar to “Little Miss Sunshine”?
E-Commerce
Which products are the most similar to “Kindle”?
Y. Sun, J. Han, X. Yan, P. S. Yu, and Tianyi Wu, “PathSim: Meta Path-Based Top-
K Similarity Search in Heterogeneous Information Networks”, VLDB'11
05/28/12 ZJU
14. Prof. Han
Information Network Model & Relationship Mining over DBLP
What they take advantage of?
Network Schema, called Meta-Path, take an example:
05/28/12 ZJU
15. Prof. Han
Information Network Model & Relationship Mining over DBLP
Relationship Prediction in Information Networks
Whom should I collaborate with?
Which paper should I cite for this topic?
Whom else should I follow on Twitter?
Y.Sun, R.Barber, M.Gupta, C.Aggarwal and J.Han. “Co-author Relationship
Prediction in Hererogeneous Bibliographic Networks”, ASONAM’11, July 2011
Role Discovery: Extraction Semantic Information from
Links
Ref. C. Wang, J. Han, et al., “Mining Advisor-Advisee Relationships from
Research Publication Networks”, SIGKDD 2010
Data Cleaning and Trust Analysis by InfoNet Analysis
Xiaoxin Yin, Jiawei Han, Philip S. Yu, “Truth Discovery with Multiple Conflicting
Information Providers on the Web”, TKDE’08
05/28/12 ZJU
16. Prof. Han
Information Network Model & Relationship Mining over DBLP
Automatic discovery of Entity Pages
(T. Weinger, Jiawei Han et al. WWW’11)
Given a reference page, can we find entity pages of the same
Type?
14 pages references
05/28/12 ZJU
17. Prof. Pei
Uncertain Data Mining
Mining uncertain data Probability is vital
Models and Representation of uncertain data
Mining Frequent Patterns
Classification
Clustering
Outlier Detection
Topic-Oriented
Nothing to do with database, namely nothing to do with query
Learn yourself
Outlier Detection on uncertain data is a challenge
This is what I most concern about from point view of knowledge
05/28/12 ZJU
18. Our Thoughts
As for pure research, there is no speculation
What’s the proper mode for research?
Method-Oriented: Prof. Yang
All about transfer learning
All I have to do is solve practical problems with transfer learning, eg.
Link predication.
Application-Oriented: Prof. Han
Find fun in DBLP, all about relationship mining
Every part of Prof. Han’s method is not new, but leading by the problem,
the whole framework is innovative
Topic-Oriented: Prof. Pei
Clustering and outlier detection on uncertain data
He and his team is dependent on solid accumulation
05/28/12 ZJU
19. Our Thoughts
Is the problem valuable? Can it
be solved by us?
How do they do research? Revise many
Accumulation Real world problem Valuable research problem
times
Discuss and test to find a suitable method Experiment Paper
Accumulated by means of and hard
Experience imitation Test again and again.
work Accumulation, experience,
Not just scan ppt, but do experiments others had did
judgment….
Solve problems others had solved
Different field, different mode
Application-Oriented: flexible
Method-Oriented: mathematics
Topic-Oriented: accumulation
Work as a Team
05/28/12 ZJU
20. Our Thoughts
Prof. Pei: Small data
Can you learn a model just with a little data?
Data collection is very costly
Since you can know what you want using 1GB, why do you use
1TB with so many machines?
Prof. Pei: do we really need experiments? No, provided that what
you have done is really convictive./ Yes, ‘cause our job is not
convictive enough.
Read every helpful paper
Research should be labeled by researchers, their teams
and their labs. Everyone has his own pan, not that all
guys just have one.
05/28/12 ZJU
21. Our Thoughts
20/80 Law
I have fallen behind from others
I had lost myself in clouds of research for one year. I
hope I can find my way.
05/28/12 ZJU