SlideShare ist ein Scribd-Unternehmen logo
1 von 72
Downloaden Sie, um offline zu lesen
How to conduct high quality
research and write good papers




           Haixun Wang
      Microsoft Research Asia
What is research?
1. Solve a problem using existing methods.
   Write a README.txt. (low innovation, little impact)

2. Improve existing solutions to an existing problem.
   Write a tech report. (low innovation, little impact)

3. Create a new solution to an existing problem.
   Write a paper. (high innovation, low impact)

4. Identify a new problem. Generalize the solution.
   Write a paper. (high innovation, high impact)



                                                      2
Research and Engineering
• New Solutions  Useful Solutions




                             3
How innovative are you?



                    4
• It is a cruel that the children who died during the earthquake
  in Dujiangyan (都江堰), China, knew all too well that their
  country once led the world in the knowledge of the planet’s
  seismicity.

• Why, if the Chinese had come to know so much about
  earthquakes so early on in their immensely long history,
  were they never able to minimize the effects of the world’s
  contortions — to at least the degree that America has?

• Why did they leave the West to become leaders in the field,
  and leave themselves to become mired, time and again, in
  the kind of tragic events that we are witnessing this week?

                                                    5
• In almost every area of technology the Chinese were once
  supreme, without competition. And yet, in the 16th century
  China’s innovative energies inexplicably withered away, and
  modern science became the virtual monopoly of the West.

• There had been any number of Chinese Euclids and
  Archimedes but there was never to be a Chinese Newton or
  Galileo.


• Until this week Dujiangyan was a place of which China could
  be proud; today its wreckage stands as a tragic monument
  to a culture that turned its back on its remarkable and
  glittering history (of innovation).
                                                  6
How to train your innovation?



                       7
Read, Read, Read



                   8
Malcolm Gladwell
 Editor, New Yorker
                      9
10
10,000 hours of success

  Excellence requires a minimum level of practice.
         10,000 hours is the magic number
           (3 hours per day for 10 years)




                                                11
By the time Bill Gates dropped out of Harvard, he had been
programming nonstop for seven years, which was way past
10,000 hours.years, I spent more than 3 hours watching TV
In the last 10
everyday, how come I didn’t achieve anything?




                                               12
Nicholas Carr, Atlantic Monthly
          July 2008

                                  13
Independent thinking

• the downfall of deep reading/thinking
• Internet is rewiring our brains, forcibly adapting
  us to tolerate only bite-sized summations and
  simplified blips at the expense of deeper thought
• we risk turning into ‘pancake people’—spread
  wide and thin as we connect with that vast
  network of information accessed by the mere
  touch of a button.

                                          14
How to train your creativity?



   Write, Write, Write!



                         15
Research = Writing + Rewriting
• Turn your idea into writing before implementing it.

• Hard to write it down? Because you don’t
  understand the problem (or your idea).
   – Writing forces us to be clear, focused
   – Writing crystallises what we don’t understand


• Writing opens the way to dialogue with others:
  reality check, critique, and collaboration.


                                                     16
Research = Writing + Rewriting
• The process of writing and rewriting is the process of
   – developing your idea
   – generalizing your problem/solution

• After many times of rewriting, your problem (idea)
  maybe totally different from the problem (idea) you
  start with
   – more interesting and challenging


• It’s not a waste of time. It’s how you should spend
  your time when you do research.
                                             17
How to find a topic?




The Theory of Flying Pigs


                             18
In Reality

  – Pigs do not have to fly.
[ABSTRACT] In this paper, we identify the
importance for pigs to fly. We show that
many challenging tasks can be modeled by
flying pigs. Thus, solving the flying pig
problem benefits a large variety of
applications.



                                  20
[ABSTRACT] In this paper, we extend the
pioneering work of flying pigs [1]. Our
improvement enables pigs to fly higher.
[ABSTRACT] Recently, the flying pig problem
has attracted significant attention [1, 2].
However, pigs in previous works are all flying
very slow. In this paper, we introduce a
technique so that pigs can fly an order-of-
magnitude faster.



                                      22
and soon we have many papers …
What topic to work on?

• The choices you make will define your career

• No real problems at hand
– Get a proceeding. Read from the 1st page.
– Ask senior people what they are working on.
– Make it go faster/higher


• Find real problems, use real data
                                           24
Is this topic meaningful?
• Convince yourself
  – an issue of research ethics

• Talk to your colleagues
  – Hey! I have a crazy idea
  – Convince them

• Talk to/Read from people not in your field
  – mathematicians, physicists, biologists, …
                                            25
Database research as an example
• Database has been one of the most successful
  fields in CS in terms of applications and
  industrial value!

• However, is there any leftover for substantial
  database research?
  – Relational database theory, a closing world?
  – Too many index structures already?


                                            26
Example: Data Model
• From : RDBMS
  – Normalization is one of the cornerstones of
    RDBMS
  – Theoretical results and practical applications

• To: XML
  – Storage model: still an open problem
  – hybrid database, Native XML support


                                             27
Example: Logic Databases
• Logic database was a hot topic in the 80’s and
  early 90’s
  – models, semantics, magic sets, …
  – many results have since been incorporated into
    RDBMS
  – is Logic Database dead?

• Rejuvenated by semantic query processing
  – ontology, description logics


                                          28
Broadening the Scope
• Concern (VLDB endowment meeting, 98’):
   – The area of database research may lose the pivotal role it
     now plays among information system technologies


• Keep DB research current and relevant
   – We should maintain a watch on trends and future
     directions in the general area of information management


• Can a traditionally non-DB/KDD research problem be
  treated using DB/KDD methods?

                                                    29
Writing techniques
• Overcome language barrier

• Paper structure and content




                                31
The Language Barrier
• One must first know the
  rules to break them




                            32
Some General Tips
•   Choose the right word/phrase
•   Use the active voice
•   A picture is worth 10,000 words
•   Use a fair amount of formalization
•   The divide-and-conquer approach
•   Keep it simple and stupid



                                         33
Choose the right word/phrase


  • Chicken without sexual life

  • Husband and wife’s lung slice

  • Bean curd made by a pockmarked
    woman




                                    34
Use the active voice


• Ten Yuan will be paid for every
  one-time towel you use.




                             35
Use the active voice
   The passive voice is “respectable” but it DEADENS
            your paper. Avoid it at all costs.
                                                                     “We” = you
                                                                      and the
           NO                               YES                        reader

   It can be seen that...              We can see that...
     34 tests were run                  We ran 34 tests
  These properties were             We wanted to retain these
    thought desirable                     properties
                                                                     “We” = the
It might be thought that this   You might think this would be         authors
    would be a type error               a type error

                      “You” = the
                                          Slide borrowed from Simon Peyton Jones
                        reader
                                                                36
Some General Tips
•   Choose the right word/phrase
•   Use the active voice
•   A picture is worth 10,000 words
•   Use a fair amount of formalization
•   The divide-and-conquer approach
•   Keep it simple and stupid



                                         37
Be Specific

           NO!                                  YES!
We describe the WizWoz       We give the syntax and semantics of a
system. It is really cool.   language that supports concurrent
                             processes (Section 3). Its innovative
                             features are...
We study its properties      We prove that the type system is sound,
                             and that type checking is decidable
                             (Section 4)
We have used WizWoz in       We have built a GUI toolkit in WizWoz,
practice                     and used it to implement a text editor
                             (Section 5). The result is half the length of
                             the Java version.
                                                From Simon Peyton Jones
                                                           38
Structure (conference paper)
•   Title (1000 readers)
•   Abstract (4 sentences, 100 readers)
•   Introduction (1 page, 100 readers)
•   The problem (1 page, 10 readers)
•   My idea (2 pages, 10 readers)
•   The details (5 pages, 3 readers)
•   Related work (1-2 pages, 10 readers)
•   Conclusions and further work (0.5 pages)
                      Slide borrowed from Simon Peyton Jones
                                            39
An Attractive Abstract Counts
• Abstract is for people to skim through in one minute
   –   No technical details
   –   Plain English, easy to understand
   –   No assumption of DB/KDD background
   –   As short as possible
• What to write
   – The problem, and why it is important and challenging
   – Your technical thrust, progress and contributions
   – Broader impact
• Write it last!
                                                  40
What Is a Good Introduction
• Starting from good stories
  – Motivation – what is the problem and why is the
    problem important?
  – 1-2 typical real-life applications
• Intuition and general ideas
  –   Intuition is most important!
  –   No technical details
  –   Understandable for a CS undergraduate
  –   Use clear, small examples

                                              41
What Is a Good Introduction (2)
• Highlight major contributions
  – Typical examples: identifying a new problem,
    novel solutions, a systematic performance
    study, …
  – Only list the major ones, don’t over claim
  – Again, no technical details
  – A road map of the rest of the paper



                                           42
What’s the difference?




                    Hardcover: 1312 pages
                    Publisher: Wiley; 7th edition (June 20, 2001)
                    Language: English
页码:378 页            ISBN-10: 0471381578
出版日期:2004年01月       ISBN-13: 978-0471381570
ISBN:7040137860     Product Dimensions: 10.1 x 9.1 x 1.9 inches
条形码:9787040137866   Shipping Weight: 6.1 pounds
                                              43
Writing paper is like telling a story
• The goal of the title is to get the reader to read
  the abstract …

• The goal of the abstract is to get the reader to
  read the introduction …

• …

• You need a good set up … a suspense … then
  you unfold your story slowly …
                                            44
Goal: creating a suspense
• Reader thinks “gosh, if they can really deliver
  this, that’d be exciting. I’d better read on”




                                         45
Create Suspense


Many years later, as he faced the firing
squad, Colonel Aureliano Buendia was to
remember that distant afternoon when
his father took him to discover ice.


               One hundred years of solitude
                  by Gabriel Garcí Márquez
                                 a
                                      46
Keep it Simple and Stupid


    一夜北风紧
               红楼梦/曹雪芹

      这句虽粗,不见底下的,这正是
      会作诗的起法。不但好,而且留
      了写不尽的多少地步与后人。

                      47
An Example (SIGMOD’02)




                         48
Motivation Found!




Shifting Pattern   Scaling Pattern
  {b,c,h,j,e}        {f,d,a,g,i}


                            49
Is It Meaningful?
                CH1I   CH1B   CH1D    CH2I   CH2B   …

        VPS8    401    281    120     275    298

        SSA1    401    292    109     580    238

        SP07    228    290     48     285    224

        EFB1    318    280     37     277    215    …

        MDM10   538    272    266     277    236
        CYS3    322    288     41     278    219
        DEP1    317    272     40     273    232    …

        NTG1    329    296     33     274    228

         …              …             …




                                     50
Intuition Is the Most Important
• Example
  – ensemble classifier for streams
• Why ensemble?
  – Rigorous mathematical proof which shows ensemble
    reduces classification variance
• Many benefits
  – High accuracy, ease of use, best approach in many
    aspects
• Result:
  – paper rejected


                                             51
Optimal decision boundary




    t0         t1           t
                       t00& t1 & t2 errors!
                        t & t 2 no errors
                            2
                                52
How to Present Technical Details?
• The top-down approach
  – First give an overview of the algorithm
  – Present details of the major steps
• The bottom-up approach
  – Start from the critical details
  – Summarize the discussion and present the algorithm
• The hybrid approach
  – Top-down to partition the global problem
  – Bottom-up to present solutions to sub-problems


                                              53
How to Present Examples?
• Occam’s razor (the principle of parsimony)
  – “One should not increase, beyond what is
    necessary, the number of entities required to
    explain anything”
• Find the simplest example that can show
  all the points you want to show
  – Some data in running examples can be highly
    skewed
  – Only select data that can show critical ideas

                                         54
Worksheet of Running Example
• Work out the complete running example
• Select the interesting and critical
  segments
• Present multiple small examples in the
  paper
  – Only one running example if possible
  – Preferably several paragraphs in one example
  – Don’t give a long, exhaustive example
  – Each example should focus on one point

                                       55
How to Present Algorithms?
• Choose the appropriate abstract level
  – Operations obvious – omit them
     • Readers have general CS background
  – Complicated operations – function description
• The WWH sequence
  – Why do we need such an operation?
  – What is the operation?
  – How can the operation done efficiently?

                                            56
Keep Your Algorithm Short
• Long algorithms are hard to understand
• Multi-level expansion of algorithms
  – Use functions or procedures
• Ideally, each algorithm is less than 20
  lines
• Control the complexity
  – Don’t use too many variables
  – Use meaningful variable names
  – Use plain text to explain

                                     57
Performance Study Goals
• “Wisconsin wallpaper”
• Clearly say why you design and conduct
  the experiments
  – Effectiveness measures
  – Efficiency measures
  – Other considerations




                                  58
How to Present Experimental
             Results?
• Experiment settings
• Performance study goals
• Selected experimental results
  – Explanation
• Summary of performance study




                                  59
How to Handle Related Work?
• If possible, talk about related work at the end of the
  paper.
   – Do not interrupt the flow of your story
• Extensive collection of related work
   – Don’t forget to look at the latest results
   – Go beyond your field, if possible
• Give sufficient credits to others
   – We are standing on the shoulders of giants
   – Avoid emotional words
   – Be precise in comparison
• Point out critical points
   – Use examples if necessary
                                                  60
What Should Be in Discussion?
• Related issues
  – Constraints in your method
  – Drawbacks
• Possible extensions
  – Point out the other problems that can be solved
    straightforwardly using the proposed method
  – Broader impact
• Future work if you have a detailed plan

                                           61
Writing Strong Conclusions
• Summarize the paper briefly.
  – What is the problem solved
  – Major technical contributions
  – Major findings and results


• Future work if possible



                                    62
Aiming high!
               Major DB/KDD Conferences

• DB (in my opinion)
  – 1st tier: SIGMOD, VLDB, ICDE
  – 2nd tier: EDBT, ICDT, CIKM, ER, SSDBM
  – Regional: DASFAA, WAIM, British DB Conf,
    Australian DB Conf, Brazilian DB Conf, DEXA, …
• KDD (in my opinion)
  –   Top: KDD
  –   2nd tier: SIAM DB, ICDM,
  –   Regional: PAKDD, PKDD, …
  –   KDD papers can be sent to DB & ML conferences

                                            63
Reviewers’ Comments




                  64
Reviewers Comments
• The conference review process is necessarily
  imperfect

• The reviewers operate under strict time
  constraints, and the committee must make
  quick decisions.

• Some good papers will be rejected and some
  embarrassing papers will be accepted.

                                       65
Thank you!




             66
My Paper Got Accepted!
• Congratulations!
• Address reviewers’ comments in the final
  version
  – Adopt good points
  – Clarify and remove confusions
• Prepare a nice talk and/or poster
  – Pass the general idea
  – Use examples wherever possible
  – Use as few symbolic text as possible

                                           67
Recycle a Paper
• Before publication, a paper is likely to go
  through several rejections
  – SIGMOD,VLDB,ICDE acceptance is around
    10%-15%
  – A conference with 25+% acceptance ratio
    may not be good
• Aim at the next chance



                                      68
Learn from the Reviews
• Do we aim at the right target?
  – If 2/3 of reviewers are laymen of your subject,
    consider the forum seriously
• Address technical issues
  – Response to reviewers’ comments by
    revising/enhancing technical description and
    experiments
• Improve writing
  – Confused reviewers? Clarify the issues
  – Correct any linguistic problems pointed out

                                                  69
Why Journal Papers?
• Records archived
• Important for degree, promotion,
  election, …




                                     70
Conference vs. Journal Papers
• Length
  – Journal papers are often longer
• Objectives
  – Conference papers mainly pass the ideas and
    results
  – Journal papers systematically report and
    justify the research, more formal



                                      71
From Conference Papers to
          Journal Papers
• A critical requirement: “major value added”
  – 30% in some journals, e.g., TODS, TKDE
  – But, how to count?
• Some “major values”
  – More detailed/complete examples
  – Complete formal results and proofs
  – Further variations and extensions of the
    method
  – Triviality should be avoided

                                         72
Steps Towards Good Research
• Motivations and problems
  – More important than the solutions
• Re-search
  – Systematic development of solutions
• Writing a good paper
  – Careful design
• Submissions
  – Good luck!

                                          73

Weitere ähnliche Inhalte

Ähnlich wie How2research

ICDE2010: DBMS: Lessons from the First 50 Years, Speculations for the Next 50
ICDE2010: DBMS: Lessons from the First 50 Years, Speculations for the Next 50ICDE2010: DBMS: Lessons from the First 50 Years, Speculations for the Next 50
ICDE2010: DBMS: Lessons from the First 50 Years, Speculations for the Next 50
zukun
 
GCurtis_IASummit_poster
GCurtis_IASummit_posterGCurtis_IASummit_poster
GCurtis_IASummit_poster
Gayle Curtis
 

Ähnlich wie How2research (20)

Creativity to Innovation
Creativity to Innovation Creativity to Innovation
Creativity to Innovation
 
Writing Workshop 2
Writing Workshop 2Writing Workshop 2
Writing Workshop 2
 
27 creativity and innovation tools - in one-pagers!
27 creativity and innovation tools - in one-pagers!27 creativity and innovation tools - in one-pagers!
27 creativity and innovation tools - in one-pagers!
 
ICDE2010: DBMS: Lessons from the First 50 Years, Speculations for the Next 50
ICDE2010: DBMS: Lessons from the First 50 Years, Speculations for the Next 50ICDE2010: DBMS: Lessons from the First 50 Years, Speculations for the Next 50
ICDE2010: DBMS: Lessons from the First 50 Years, Speculations for the Next 50
 
Design Thinking Asia Society Texas Center
Design Thinking Asia Society Texas CenterDesign Thinking Asia Society Texas Center
Design Thinking Asia Society Texas Center
 
I Love Patterns
I Love PatternsI Love Patterns
I Love Patterns
 
Creative Thinking
Creative ThinkingCreative Thinking
Creative Thinking
 
Creative Thinking
Creative ThinkingCreative Thinking
Creative Thinking
 
Innovation Boot Camp: Fostering a More Innovative Workplace (PPT)
Innovation Boot Camp: Fostering a More Innovative Workplace (PPT)Innovation Boot Camp: Fostering a More Innovative Workplace (PPT)
Innovation Boot Camp: Fostering a More Innovative Workplace (PPT)
 
Design Patterns Story
Design Patterns StoryDesign Patterns Story
Design Patterns Story
 
Creativity unbundled
Creativity unbundledCreativity unbundled
Creativity unbundled
 
Write clearly: take your web writing to the next level
Write clearly: take your web writing to the next levelWrite clearly: take your web writing to the next level
Write clearly: take your web writing to the next level
 
Immerse, Imagine, Invent, Articulate: A framework for disruptive innovation
Immerse, Imagine, Invent, Articulate: A framework for disruptive innovationImmerse, Imagine, Invent, Articulate: A framework for disruptive innovation
Immerse, Imagine, Invent, Articulate: A framework for disruptive innovation
 
Science communications: Writing for impact
Science communications: Writing for impact Science communications: Writing for impact
Science communications: Writing for impact
 
GCurtis_IASummit_poster
GCurtis_IASummit_posterGCurtis_IASummit_poster
GCurtis_IASummit_poster
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Mining
 
Unlock your own design thinking potential
Unlock your own design thinking potentialUnlock your own design thinking potential
Unlock your own design thinking potential
 
How to write research papers? Version 5.0
How to write research papers? Version 5.0How to write research papers? Version 5.0
How to write research papers? Version 5.0
 
ReScience
ReScienceReScience
ReScience
 
Learning in the Age of Knowledge on Demand
Learning in the Age of Knowledge on DemandLearning in the Age of Knowledge on Demand
Learning in the Age of Knowledge on Demand
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

How2research

  • 1. How to conduct high quality research and write good papers Haixun Wang Microsoft Research Asia
  • 2. What is research? 1. Solve a problem using existing methods. Write a README.txt. (low innovation, little impact) 2. Improve existing solutions to an existing problem. Write a tech report. (low innovation, little impact) 3. Create a new solution to an existing problem. Write a paper. (high innovation, low impact) 4. Identify a new problem. Generalize the solution. Write a paper. (high innovation, high impact) 2
  • 3. Research and Engineering • New Solutions  Useful Solutions 3
  • 5. • It is a cruel that the children who died during the earthquake in Dujiangyan (都江堰), China, knew all too well that their country once led the world in the knowledge of the planet’s seismicity. • Why, if the Chinese had come to know so much about earthquakes so early on in their immensely long history, were they never able to minimize the effects of the world’s contortions — to at least the degree that America has? • Why did they leave the West to become leaders in the field, and leave themselves to become mired, time and again, in the kind of tragic events that we are witnessing this week? 5
  • 6. • In almost every area of technology the Chinese were once supreme, without competition. And yet, in the 16th century China’s innovative energies inexplicably withered away, and modern science became the virtual monopoly of the West. • There had been any number of Chinese Euclids and Archimedes but there was never to be a Chinese Newton or Galileo. • Until this week Dujiangyan was a place of which China could be proud; today its wreckage stands as a tragic monument to a culture that turned its back on its remarkable and glittering history (of innovation). 6
  • 7. How to train your innovation? 7
  • 10. 10
  • 11. 10,000 hours of success Excellence requires a minimum level of practice. 10,000 hours is the magic number (3 hours per day for 10 years) 11
  • 12. By the time Bill Gates dropped out of Harvard, he had been programming nonstop for seven years, which was way past 10,000 hours.years, I spent more than 3 hours watching TV In the last 10 everyday, how come I didn’t achieve anything? 12
  • 13. Nicholas Carr, Atlantic Monthly July 2008 13
  • 14. Independent thinking • the downfall of deep reading/thinking • Internet is rewiring our brains, forcibly adapting us to tolerate only bite-sized summations and simplified blips at the expense of deeper thought • we risk turning into ‘pancake people’—spread wide and thin as we connect with that vast network of information accessed by the mere touch of a button. 14
  • 15. How to train your creativity? Write, Write, Write! 15
  • 16. Research = Writing + Rewriting • Turn your idea into writing before implementing it. • Hard to write it down? Because you don’t understand the problem (or your idea). – Writing forces us to be clear, focused – Writing crystallises what we don’t understand • Writing opens the way to dialogue with others: reality check, critique, and collaboration. 16
  • 17. Research = Writing + Rewriting • The process of writing and rewriting is the process of – developing your idea – generalizing your problem/solution • After many times of rewriting, your problem (idea) maybe totally different from the problem (idea) you start with – more interesting and challenging • It’s not a waste of time. It’s how you should spend your time when you do research. 17
  • 18. How to find a topic? The Theory of Flying Pigs 18
  • 19. In Reality – Pigs do not have to fly.
  • 20. [ABSTRACT] In this paper, we identify the importance for pigs to fly. We show that many challenging tasks can be modeled by flying pigs. Thus, solving the flying pig problem benefits a large variety of applications. 20
  • 21. [ABSTRACT] In this paper, we extend the pioneering work of flying pigs [1]. Our improvement enables pigs to fly higher.
  • 22. [ABSTRACT] Recently, the flying pig problem has attracted significant attention [1, 2]. However, pigs in previous works are all flying very slow. In this paper, we introduce a technique so that pigs can fly an order-of- magnitude faster. 22
  • 23. and soon we have many papers …
  • 24. What topic to work on? • The choices you make will define your career • No real problems at hand – Get a proceeding. Read from the 1st page. – Ask senior people what they are working on. – Make it go faster/higher • Find real problems, use real data 24
  • 25. Is this topic meaningful? • Convince yourself – an issue of research ethics • Talk to your colleagues – Hey! I have a crazy idea – Convince them • Talk to/Read from people not in your field – mathematicians, physicists, biologists, … 25
  • 26. Database research as an example • Database has been one of the most successful fields in CS in terms of applications and industrial value! • However, is there any leftover for substantial database research? – Relational database theory, a closing world? – Too many index structures already? 26
  • 27. Example: Data Model • From : RDBMS – Normalization is one of the cornerstones of RDBMS – Theoretical results and practical applications • To: XML – Storage model: still an open problem – hybrid database, Native XML support 27
  • 28. Example: Logic Databases • Logic database was a hot topic in the 80’s and early 90’s – models, semantics, magic sets, … – many results have since been incorporated into RDBMS – is Logic Database dead? • Rejuvenated by semantic query processing – ontology, description logics 28
  • 29. Broadening the Scope • Concern (VLDB endowment meeting, 98’): – The area of database research may lose the pivotal role it now plays among information system technologies • Keep DB research current and relevant – We should maintain a watch on trends and future directions in the general area of information management • Can a traditionally non-DB/KDD research problem be treated using DB/KDD methods? 29
  • 30. Writing techniques • Overcome language barrier • Paper structure and content 31
  • 31. The Language Barrier • One must first know the rules to break them 32
  • 32. Some General Tips • Choose the right word/phrase • Use the active voice • A picture is worth 10,000 words • Use a fair amount of formalization • The divide-and-conquer approach • Keep it simple and stupid 33
  • 33. Choose the right word/phrase • Chicken without sexual life • Husband and wife’s lung slice • Bean curd made by a pockmarked woman 34
  • 34. Use the active voice • Ten Yuan will be paid for every one-time towel you use. 35
  • 35. Use the active voice The passive voice is “respectable” but it DEADENS your paper. Avoid it at all costs. “We” = you and the NO YES reader It can be seen that... We can see that... 34 tests were run We ran 34 tests These properties were We wanted to retain these thought desirable properties “We” = the It might be thought that this You might think this would be authors would be a type error a type error “You” = the Slide borrowed from Simon Peyton Jones reader 36
  • 36. Some General Tips • Choose the right word/phrase • Use the active voice • A picture is worth 10,000 words • Use a fair amount of formalization • The divide-and-conquer approach • Keep it simple and stupid 37
  • 37. Be Specific NO! YES! We describe the WizWoz We give the syntax and semantics of a system. It is really cool. language that supports concurrent processes (Section 3). Its innovative features are... We study its properties We prove that the type system is sound, and that type checking is decidable (Section 4) We have used WizWoz in We have built a GUI toolkit in WizWoz, practice and used it to implement a text editor (Section 5). The result is half the length of the Java version. From Simon Peyton Jones 38
  • 38. Structure (conference paper) • Title (1000 readers) • Abstract (4 sentences, 100 readers) • Introduction (1 page, 100 readers) • The problem (1 page, 10 readers) • My idea (2 pages, 10 readers) • The details (5 pages, 3 readers) • Related work (1-2 pages, 10 readers) • Conclusions and further work (0.5 pages) Slide borrowed from Simon Peyton Jones 39
  • 39. An Attractive Abstract Counts • Abstract is for people to skim through in one minute – No technical details – Plain English, easy to understand – No assumption of DB/KDD background – As short as possible • What to write – The problem, and why it is important and challenging – Your technical thrust, progress and contributions – Broader impact • Write it last! 40
  • 40. What Is a Good Introduction • Starting from good stories – Motivation – what is the problem and why is the problem important? – 1-2 typical real-life applications • Intuition and general ideas – Intuition is most important! – No technical details – Understandable for a CS undergraduate – Use clear, small examples 41
  • 41. What Is a Good Introduction (2) • Highlight major contributions – Typical examples: identifying a new problem, novel solutions, a systematic performance study, … – Only list the major ones, don’t over claim – Again, no technical details – A road map of the rest of the paper 42
  • 42. What’s the difference? Hardcover: 1312 pages Publisher: Wiley; 7th edition (June 20, 2001) Language: English 页码:378 页 ISBN-10: 0471381578 出版日期:2004年01月 ISBN-13: 978-0471381570 ISBN:7040137860 Product Dimensions: 10.1 x 9.1 x 1.9 inches 条形码:9787040137866 Shipping Weight: 6.1 pounds 43
  • 43. Writing paper is like telling a story • The goal of the title is to get the reader to read the abstract … • The goal of the abstract is to get the reader to read the introduction … • … • You need a good set up … a suspense … then you unfold your story slowly … 44
  • 44. Goal: creating a suspense • Reader thinks “gosh, if they can really deliver this, that’d be exciting. I’d better read on” 45
  • 45. Create Suspense Many years later, as he faced the firing squad, Colonel Aureliano Buendia was to remember that distant afternoon when his father took him to discover ice. One hundred years of solitude by Gabriel Garcí Márquez a 46
  • 46. Keep it Simple and Stupid 一夜北风紧 红楼梦/曹雪芹 这句虽粗,不见底下的,这正是 会作诗的起法。不但好,而且留 了写不尽的多少地步与后人。 47
  • 48. Motivation Found! Shifting Pattern Scaling Pattern {b,c,h,j,e} {f,d,a,g,i} 49
  • 49. Is It Meaningful? CH1I CH1B CH1D CH2I CH2B … VPS8 401 281 120 275 298 SSA1 401 292 109 580 238 SP07 228 290 48 285 224 EFB1 318 280 37 277 215 … MDM10 538 272 266 277 236 CYS3 322 288 41 278 219 DEP1 317 272 40 273 232 … NTG1 329 296 33 274 228 … … … 50
  • 50. Intuition Is the Most Important • Example – ensemble classifier for streams • Why ensemble? – Rigorous mathematical proof which shows ensemble reduces classification variance • Many benefits – High accuracy, ease of use, best approach in many aspects • Result: – paper rejected 51
  • 51. Optimal decision boundary t0 t1 t t00& t1 & t2 errors! t & t 2 no errors 2 52
  • 52. How to Present Technical Details? • The top-down approach – First give an overview of the algorithm – Present details of the major steps • The bottom-up approach – Start from the critical details – Summarize the discussion and present the algorithm • The hybrid approach – Top-down to partition the global problem – Bottom-up to present solutions to sub-problems 53
  • 53. How to Present Examples? • Occam’s razor (the principle of parsimony) – “One should not increase, beyond what is necessary, the number of entities required to explain anything” • Find the simplest example that can show all the points you want to show – Some data in running examples can be highly skewed – Only select data that can show critical ideas 54
  • 54. Worksheet of Running Example • Work out the complete running example • Select the interesting and critical segments • Present multiple small examples in the paper – Only one running example if possible – Preferably several paragraphs in one example – Don’t give a long, exhaustive example – Each example should focus on one point 55
  • 55. How to Present Algorithms? • Choose the appropriate abstract level – Operations obvious – omit them • Readers have general CS background – Complicated operations – function description • The WWH sequence – Why do we need such an operation? – What is the operation? – How can the operation done efficiently? 56
  • 56. Keep Your Algorithm Short • Long algorithms are hard to understand • Multi-level expansion of algorithms – Use functions or procedures • Ideally, each algorithm is less than 20 lines • Control the complexity – Don’t use too many variables – Use meaningful variable names – Use plain text to explain 57
  • 57. Performance Study Goals • “Wisconsin wallpaper” • Clearly say why you design and conduct the experiments – Effectiveness measures – Efficiency measures – Other considerations 58
  • 58. How to Present Experimental Results? • Experiment settings • Performance study goals • Selected experimental results – Explanation • Summary of performance study 59
  • 59. How to Handle Related Work? • If possible, talk about related work at the end of the paper. – Do not interrupt the flow of your story • Extensive collection of related work – Don’t forget to look at the latest results – Go beyond your field, if possible • Give sufficient credits to others – We are standing on the shoulders of giants – Avoid emotional words – Be precise in comparison • Point out critical points – Use examples if necessary 60
  • 60. What Should Be in Discussion? • Related issues – Constraints in your method – Drawbacks • Possible extensions – Point out the other problems that can be solved straightforwardly using the proposed method – Broader impact • Future work if you have a detailed plan 61
  • 61. Writing Strong Conclusions • Summarize the paper briefly. – What is the problem solved – Major technical contributions – Major findings and results • Future work if possible 62
  • 62. Aiming high! Major DB/KDD Conferences • DB (in my opinion) – 1st tier: SIGMOD, VLDB, ICDE – 2nd tier: EDBT, ICDT, CIKM, ER, SSDBM – Regional: DASFAA, WAIM, British DB Conf, Australian DB Conf, Brazilian DB Conf, DEXA, … • KDD (in my opinion) – Top: KDD – 2nd tier: SIAM DB, ICDM, – Regional: PAKDD, PKDD, … – KDD papers can be sent to DB & ML conferences 63
  • 64. Reviewers Comments • The conference review process is necessarily imperfect • The reviewers operate under strict time constraints, and the committee must make quick decisions. • Some good papers will be rejected and some embarrassing papers will be accepted. 65
  • 66. My Paper Got Accepted! • Congratulations! • Address reviewers’ comments in the final version – Adopt good points – Clarify and remove confusions • Prepare a nice talk and/or poster – Pass the general idea – Use examples wherever possible – Use as few symbolic text as possible 67
  • 67. Recycle a Paper • Before publication, a paper is likely to go through several rejections – SIGMOD,VLDB,ICDE acceptance is around 10%-15% – A conference with 25+% acceptance ratio may not be good • Aim at the next chance 68
  • 68. Learn from the Reviews • Do we aim at the right target? – If 2/3 of reviewers are laymen of your subject, consider the forum seriously • Address technical issues – Response to reviewers’ comments by revising/enhancing technical description and experiments • Improve writing – Confused reviewers? Clarify the issues – Correct any linguistic problems pointed out 69
  • 69. Why Journal Papers? • Records archived • Important for degree, promotion, election, … 70
  • 70. Conference vs. Journal Papers • Length – Journal papers are often longer • Objectives – Conference papers mainly pass the ideas and results – Journal papers systematically report and justify the research, more formal 71
  • 71. From Conference Papers to Journal Papers • A critical requirement: “major value added” – 30% in some journals, e.g., TODS, TKDE – But, how to count? • Some “major values” – More detailed/complete examples – Complete formal results and proofs – Further variations and extensions of the method – Triviality should be avoided 72
  • 72. Steps Towards Good Research • Motivations and problems – More important than the solutions • Re-search – Systematic development of solutions • Writing a good paper – Careful design • Submissions – Good luck! 73