Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
How2research
1. How to conduct high quality
research and write good papers
Haixun Wang
Microsoft Research Asia
2. What is research?
1. Solve a problem using existing methods.
Write a README.txt. (low innovation, little impact)
2. Improve existing solutions to an existing problem.
Write a tech report. (low innovation, little impact)
3. Create a new solution to an existing problem.
Write a paper. (high innovation, low impact)
4. Identify a new problem. Generalize the solution.
Write a paper. (high innovation, high impact)
2
5. • It is a cruel that the children who died during the earthquake
in Dujiangyan (都江堰), China, knew all too well that their
country once led the world in the knowledge of the planet’s
seismicity.
• Why, if the Chinese had come to know so much about
earthquakes so early on in their immensely long history,
were they never able to minimize the effects of the world’s
contortions — to at least the degree that America has?
• Why did they leave the West to become leaders in the field,
and leave themselves to become mired, time and again, in
the kind of tragic events that we are witnessing this week?
5
6. • In almost every area of technology the Chinese were once
supreme, without competition. And yet, in the 16th century
China’s innovative energies inexplicably withered away, and
modern science became the virtual monopoly of the West.
• There had been any number of Chinese Euclids and
Archimedes but there was never to be a Chinese Newton or
Galileo.
• Until this week Dujiangyan was a place of which China could
be proud; today its wreckage stands as a tragic monument
to a culture that turned its back on its remarkable and
glittering history (of innovation).
6
11. 10,000 hours of success
Excellence requires a minimum level of practice.
10,000 hours is the magic number
(3 hours per day for 10 years)
11
12. By the time Bill Gates dropped out of Harvard, he had been
programming nonstop for seven years, which was way past
10,000 hours.years, I spent more than 3 hours watching TV
In the last 10
everyday, how come I didn’t achieve anything?
12
14. Independent thinking
• the downfall of deep reading/thinking
• Internet is rewiring our brains, forcibly adapting
us to tolerate only bite-sized summations and
simplified blips at the expense of deeper thought
• we risk turning into ‘pancake people’—spread
wide and thin as we connect with that vast
network of information accessed by the mere
touch of a button.
14
15. How to train your creativity?
Write, Write, Write!
15
16. Research = Writing + Rewriting
• Turn your idea into writing before implementing it.
• Hard to write it down? Because you don’t
understand the problem (or your idea).
– Writing forces us to be clear, focused
– Writing crystallises what we don’t understand
• Writing opens the way to dialogue with others:
reality check, critique, and collaboration.
16
17. Research = Writing + Rewriting
• The process of writing and rewriting is the process of
– developing your idea
– generalizing your problem/solution
• After many times of rewriting, your problem (idea)
maybe totally different from the problem (idea) you
start with
– more interesting and challenging
• It’s not a waste of time. It’s how you should spend
your time when you do research.
17
18. How to find a topic?
The Theory of Flying Pigs
18
20. [ABSTRACT] In this paper, we identify the
importance for pigs to fly. We show that
many challenging tasks can be modeled by
flying pigs. Thus, solving the flying pig
problem benefits a large variety of
applications.
20
21. [ABSTRACT] In this paper, we extend the
pioneering work of flying pigs [1]. Our
improvement enables pigs to fly higher.
22. [ABSTRACT] Recently, the flying pig problem
has attracted significant attention [1, 2].
However, pigs in previous works are all flying
very slow. In this paper, we introduce a
technique so that pigs can fly an order-of-
magnitude faster.
22
24. What topic to work on?
• The choices you make will define your career
• No real problems at hand
– Get a proceeding. Read from the 1st page.
– Ask senior people what they are working on.
– Make it go faster/higher
• Find real problems, use real data
24
25. Is this topic meaningful?
• Convince yourself
– an issue of research ethics
• Talk to your colleagues
– Hey! I have a crazy idea
– Convince them
• Talk to/Read from people not in your field
– mathematicians, physicists, biologists, …
25
26. Database research as an example
• Database has been one of the most successful
fields in CS in terms of applications and
industrial value!
• However, is there any leftover for substantial
database research?
– Relational database theory, a closing world?
– Too many index structures already?
26
27. Example: Data Model
• From : RDBMS
– Normalization is one of the cornerstones of
RDBMS
– Theoretical results and practical applications
• To: XML
– Storage model: still an open problem
– hybrid database, Native XML support
27
28. Example: Logic Databases
• Logic database was a hot topic in the 80’s and
early 90’s
– models, semantics, magic sets, …
– many results have since been incorporated into
RDBMS
– is Logic Database dead?
• Rejuvenated by semantic query processing
– ontology, description logics
28
29. Broadening the Scope
• Concern (VLDB endowment meeting, 98’):
– The area of database research may lose the pivotal role it
now plays among information system technologies
• Keep DB research current and relevant
– We should maintain a watch on trends and future
directions in the general area of information management
• Can a traditionally non-DB/KDD research problem be
treated using DB/KDD methods?
29
32. Some General Tips
• Choose the right word/phrase
• Use the active voice
• A picture is worth 10,000 words
• Use a fair amount of formalization
• The divide-and-conquer approach
• Keep it simple and stupid
33
33. Choose the right word/phrase
• Chicken without sexual life
• Husband and wife’s lung slice
• Bean curd made by a pockmarked
woman
34
34. Use the active voice
• Ten Yuan will be paid for every
one-time towel you use.
35
35. Use the active voice
The passive voice is “respectable” but it DEADENS
your paper. Avoid it at all costs.
“We” = you
and the
NO YES reader
It can be seen that... We can see that...
34 tests were run We ran 34 tests
These properties were We wanted to retain these
thought desirable properties
“We” = the
It might be thought that this You might think this would be authors
would be a type error a type error
“You” = the
Slide borrowed from Simon Peyton Jones
reader
36
36. Some General Tips
• Choose the right word/phrase
• Use the active voice
• A picture is worth 10,000 words
• Use a fair amount of formalization
• The divide-and-conquer approach
• Keep it simple and stupid
37
37. Be Specific
NO! YES!
We describe the WizWoz We give the syntax and semantics of a
system. It is really cool. language that supports concurrent
processes (Section 3). Its innovative
features are...
We study its properties We prove that the type system is sound,
and that type checking is decidable
(Section 4)
We have used WizWoz in We have built a GUI toolkit in WizWoz,
practice and used it to implement a text editor
(Section 5). The result is half the length of
the Java version.
From Simon Peyton Jones
38
38. Structure (conference paper)
• Title (1000 readers)
• Abstract (4 sentences, 100 readers)
• Introduction (1 page, 100 readers)
• The problem (1 page, 10 readers)
• My idea (2 pages, 10 readers)
• The details (5 pages, 3 readers)
• Related work (1-2 pages, 10 readers)
• Conclusions and further work (0.5 pages)
Slide borrowed from Simon Peyton Jones
39
39. An Attractive Abstract Counts
• Abstract is for people to skim through in one minute
– No technical details
– Plain English, easy to understand
– No assumption of DB/KDD background
– As short as possible
• What to write
– The problem, and why it is important and challenging
– Your technical thrust, progress and contributions
– Broader impact
• Write it last!
40
40. What Is a Good Introduction
• Starting from good stories
– Motivation – what is the problem and why is the
problem important?
– 1-2 typical real-life applications
• Intuition and general ideas
– Intuition is most important!
– No technical details
– Understandable for a CS undergraduate
– Use clear, small examples
41
41. What Is a Good Introduction (2)
• Highlight major contributions
– Typical examples: identifying a new problem,
novel solutions, a systematic performance
study, …
– Only list the major ones, don’t over claim
– Again, no technical details
– A road map of the rest of the paper
42
42. What’s the difference?
Hardcover: 1312 pages
Publisher: Wiley; 7th edition (June 20, 2001)
Language: English
页码:378 页 ISBN-10: 0471381578
出版日期:2004年01月 ISBN-13: 978-0471381570
ISBN:7040137860 Product Dimensions: 10.1 x 9.1 x 1.9 inches
条形码:9787040137866 Shipping Weight: 6.1 pounds
43
43. Writing paper is like telling a story
• The goal of the title is to get the reader to read
the abstract …
• The goal of the abstract is to get the reader to
read the introduction …
• …
• You need a good set up … a suspense … then
you unfold your story slowly …
44
44. Goal: creating a suspense
• Reader thinks “gosh, if they can really deliver
this, that’d be exciting. I’d better read on”
45
45. Create Suspense
Many years later, as he faced the firing
squad, Colonel Aureliano Buendia was to
remember that distant afternoon when
his father took him to discover ice.
One hundred years of solitude
by Gabriel Garcí Márquez
a
46
46. Keep it Simple and Stupid
一夜北风紧
红楼梦/曹雪芹
这句虽粗,不见底下的,这正是
会作诗的起法。不但好,而且留
了写不尽的多少地步与后人。
47
50. Intuition Is the Most Important
• Example
– ensemble classifier for streams
• Why ensemble?
– Rigorous mathematical proof which shows ensemble
reduces classification variance
• Many benefits
– High accuracy, ease of use, best approach in many
aspects
• Result:
– paper rejected
51
52. How to Present Technical Details?
• The top-down approach
– First give an overview of the algorithm
– Present details of the major steps
• The bottom-up approach
– Start from the critical details
– Summarize the discussion and present the algorithm
• The hybrid approach
– Top-down to partition the global problem
– Bottom-up to present solutions to sub-problems
53
53. How to Present Examples?
• Occam’s razor (the principle of parsimony)
– “One should not increase, beyond what is
necessary, the number of entities required to
explain anything”
• Find the simplest example that can show
all the points you want to show
– Some data in running examples can be highly
skewed
– Only select data that can show critical ideas
54
54. Worksheet of Running Example
• Work out the complete running example
• Select the interesting and critical
segments
• Present multiple small examples in the
paper
– Only one running example if possible
– Preferably several paragraphs in one example
– Don’t give a long, exhaustive example
– Each example should focus on one point
55
55. How to Present Algorithms?
• Choose the appropriate abstract level
– Operations obvious – omit them
• Readers have general CS background
– Complicated operations – function description
• The WWH sequence
– Why do we need such an operation?
– What is the operation?
– How can the operation done efficiently?
56
56. Keep Your Algorithm Short
• Long algorithms are hard to understand
• Multi-level expansion of algorithms
– Use functions or procedures
• Ideally, each algorithm is less than 20
lines
• Control the complexity
– Don’t use too many variables
– Use meaningful variable names
– Use plain text to explain
57
57. Performance Study Goals
• “Wisconsin wallpaper”
• Clearly say why you design and conduct
the experiments
– Effectiveness measures
– Efficiency measures
– Other considerations
58
58. How to Present Experimental
Results?
• Experiment settings
• Performance study goals
• Selected experimental results
– Explanation
• Summary of performance study
59
59. How to Handle Related Work?
• If possible, talk about related work at the end of the
paper.
– Do not interrupt the flow of your story
• Extensive collection of related work
– Don’t forget to look at the latest results
– Go beyond your field, if possible
• Give sufficient credits to others
– We are standing on the shoulders of giants
– Avoid emotional words
– Be precise in comparison
• Point out critical points
– Use examples if necessary
60
60. What Should Be in Discussion?
• Related issues
– Constraints in your method
– Drawbacks
• Possible extensions
– Point out the other problems that can be solved
straightforwardly using the proposed method
– Broader impact
• Future work if you have a detailed plan
61
61. Writing Strong Conclusions
• Summarize the paper briefly.
– What is the problem solved
– Major technical contributions
– Major findings and results
• Future work if possible
62
62. Aiming high!
Major DB/KDD Conferences
• DB (in my opinion)
– 1st tier: SIGMOD, VLDB, ICDE
– 2nd tier: EDBT, ICDT, CIKM, ER, SSDBM
– Regional: DASFAA, WAIM, British DB Conf,
Australian DB Conf, Brazilian DB Conf, DEXA, …
• KDD (in my opinion)
– Top: KDD
– 2nd tier: SIAM DB, ICDM,
– Regional: PAKDD, PKDD, …
– KDD papers can be sent to DB & ML conferences
63
64. Reviewers Comments
• The conference review process is necessarily
imperfect
• The reviewers operate under strict time
constraints, and the committee must make
quick decisions.
• Some good papers will be rejected and some
embarrassing papers will be accepted.
65
66. My Paper Got Accepted!
• Congratulations!
• Address reviewers’ comments in the final
version
– Adopt good points
– Clarify and remove confusions
• Prepare a nice talk and/or poster
– Pass the general idea
– Use examples wherever possible
– Use as few symbolic text as possible
67
67. Recycle a Paper
• Before publication, a paper is likely to go
through several rejections
– SIGMOD,VLDB,ICDE acceptance is around
10%-15%
– A conference with 25+% acceptance ratio
may not be good
• Aim at the next chance
68
68. Learn from the Reviews
• Do we aim at the right target?
– If 2/3 of reviewers are laymen of your subject,
consider the forum seriously
• Address technical issues
– Response to reviewers’ comments by
revising/enhancing technical description and
experiments
• Improve writing
– Confused reviewers? Clarify the issues
– Correct any linguistic problems pointed out
69
69. Why Journal Papers?
• Records archived
• Important for degree, promotion,
election, …
70
70. Conference vs. Journal Papers
• Length
– Journal papers are often longer
• Objectives
– Conference papers mainly pass the ideas and
results
– Journal papers systematically report and
justify the research, more formal
71
71. From Conference Papers to
Journal Papers
• A critical requirement: “major value added”
– 30% in some journals, e.g., TODS, TKDE
– But, how to count?
• Some “major values”
– More detailed/complete examples
– Complete formal results and proofs
– Further variations and extensions of the
method
– Triviality should be avoided
72
72. Steps Towards Good Research
• Motivations and problems
– More important than the solutions
• Re-search
– Systematic development of solutions
• Writing a good paper
– Careful design
• Submissions
– Good luck!
73