The document discusses how astronomy is changing in the 21st century due to new technologies and large datasets. In the next decade, astronomers will observe the first sources of light in the universe with the James Webb Space Telescope and build extremely large ground-based telescopes. Surveys will collect terabytes to petabytes of data per night, far more than currently exists on the internet. While data storage is cheap, analyzing and discovering knowledge from the data presents new challenges. Citizen science initiatives are helping astronomers analyze data and make new discoveries.
5. In the next decade, we will.....
ESO ESO
... get a close-up view of our nearest massive
ESO
black hole
... test general relativity in the strong gravity
regime
VLTI-GRAVITY
6. In the next decade, we will.....
NASA STFC
... observe the first sources of light in the
Universe
... watch planets form around other stars
James Webb Space Telescope/MIRI
7. In the next decade, we will.....
... build an observatory the size of a stadium
European Extremely Large Telescope
8. LSST SKA
21st Century Surveys: DES
Big Data
LOFAR
PanSTARRS
9. LSST SKA
“More data in 1st
30 TB/night year than in
entire internet
today”
21st Century Surveys: DES
TBs/night Big Data
LOFAR
20 PB over 5 yrs TBs/night
PanSTARRS
10. LSST SKA
“More data in 1st
30 TB/night year than in
entire internet
today”
21st Century Surveys: DES
TBs/night Big Data
LOFAR
20 PB over 5 yrs TBs/night
PanSTARRS
27. I.
July
9-11!
Rob Simpson (Oxford), Sarah Kendrew (MPIA), Alasdair Allan (Exeter), Chris Lintott
(Oxford), Stuart Lowe (Las Cumbres), Carolina Ödman (Cape Town)
28. Web-based
Visualisation
Teaching
“Networked astronomy and the New
Media”
Outreach Citizen science
Artistic
30. Some tips....
1. Invite non-astronomers
2. Combine junior & senior
crowd
3. Unconference - let
participants decide
4. Provide financial support for
youngsters
5. A successful hack day requires
preparation
6. Lots of bandwidth, and pizza
31. Alyssa Goodman (CfA)
Michael Nielsen (author)
Cameron Neylon (STFC/
PLoS)
Jill Tarter (SETI) Google
Microsoft
Andy Lawrence (Edinburgh)
Past & Upcoming Speakers
David Hogg (NYU) O’Reilly Media
Bruce Berriman (VAO)
32. .Astronomy is:
Creating enthusiasm for technology and data
Highlighting potential in people and
infrastructure
Building a skilled community
49. 1.
Contribute 39%
Learning
Discovery
Community
Teaching
Beauty
Fun
Vastness
Helping
Zoo
Astronomy
Science
50. 1.
Contribute 39%
Learning
Discovery
Community
Teaching
Beauty
Fun
Vastness
Helping
2. Zoo
Astronomy ~12%
Science
51. 1.
Contribute 39%
Learning
3.
Discovery ~10%
Community
Teaching
Beauty
Fun
Vastness
Helping
2. Zoo
Astronomy ~12%
Science
52. The Milky Way Project: Studying massive star
formation on Galactic scales (PI: Rob Simpson, Oxford)
Spitzer GLIMPSE/MIPSGAL images
53. The Milky Way Project: Studying massive star
formation on Galactic scales (PI: Rob Simpson, Oxford)
Bubbles: Circles/Ellipses
Indicate rim thickness, presence of gaps
Spitzer GLIMPSE/MIPSGAL images
54. The Milky Way Project: Studying massive star
formation on Galactic scales (PI: Rob Simpson, Oxford)
Bubbles: Circles/Ellipses
Indicate rim thickness, presence of gaps
Unusual objects:
“red fuzzies”
clusters
background galaxies
small bubbles
dark nebulae
Spitzer GLIMPSE/MIPSGAL images
55. 35,000 + users logged
in US
UK 26.1%
25% classified > 5 Canada 42.0%
images Poland
Germany 3.8%
520,120 ~800,000! Other
3.8%
4.3%
user-drawn bubbles
20.0%
57. Ethical rules for citizen
science
1. Don’t waste people’s time
2. Treat participants as collaborators
3. If you can do it with a computer, don’t use
people
66. Lessons
Citizen scientists want to contribute to
research
People engage with people
In the Big Data era, citizen scientists can
play a key role in characterising data
68. Summary
Big Data: a changing paradigm in astronomy
Data storage: easy <-> Data discovery: hard
Nurture the right skills in young astronomers
Welcome involvement of citizen scientists
Hinweis der Redaktion
\n
\n
\n
In most of the 20th century astronomy was a fairly solitary science. One in which an individual could shine and pursue original research from the gathering of data to the final publication. Over the course of the century, we&#x2019;ve pooled our resources into larger, more complex observatories. \n\nThe number of active astronomers increased dramatically, as did the volume of literature produced by them. \n\nThese two developments mean that science also became more collaborative: in the last quarter of the century the mean number of authors per paper more than doubled (White, 2007).\n\nThese large and complex observatories have enabled massive progress in science, and I&#x2019;m going to take a brief detour to some of the intruments I&#x2019;ve worked on in recent years.....\n
In the next decade we will see amazing progress in astronomy enabled by new technology and new facilities, and this conference is the perfect place to hear about these. Here are some examples from projects I work on. \nIn the next decade we will be able to get an unprecedented close-up view of our nearest supermassive black hole at the centre of the Milky Way Galaxy, Sgr A*. A new instrument for the VLT Interferometer called GRAVITY will be able to track the motion of objects with a relative astrometric precision of order 10 micro-arcseconds. This will allow us to identify the source of the observed flaring of the GC black hole, and even directly observe general relativistic effects in the black hole&#x2019;s strong gravitational field.\n
We will launch the largest ever space telescope, whose supreme sensitivity will allow us to see all the way to the edge of the Universe&#x2019;s Dark Ages, and observe the first sources of photons. The first of JWST&#x2019;s instruments, MIRI, was delivered to NASA earlier this year, and this has been a wonderful project to be involved in. I think there is an entire session dedicated to JWST and its instruments on Friday, and I would definitely encourage you to attend that.\n
We will build one or more optical/IR telescopes the size of a stadium, whose collecting power vastly surpasses that of any telescope available today. In a sense these are still the more &#x201C;traditional&#x201D; types of observatories, with calls for proposals and so forth. But in addition we&#x2019;ll be bringing a whole new generation of survey instruments online that will scan the skies to incredible depths and levels of detail.....\n\nThese telescopes are in fact observatories in the &#x201C;traditional&#x201D; sense - facility class, multi-purpose, multi-instrument. There&#x2019;s another generation of observatories on its way to scan the skies and carry out large surveys. With large detector arrays in the optical, or massively distributed in the case of radio, these new telescopes will take us into an entirely new data regime.\n
Here are just a few noted examples of such new telescopes, and perhaps you can recognise your favourite &#x201C;Big Data&#x201D; survey instrument there.\n\nAmong the most staggering examples is the LSST, which will image the entire sky every 3 days, taking astronomy in the time domain to a whole new level. The Square Kilometer Array apparently will produce more data in its 1st year than is on the internet today. I loved that statistic. If we get a new discovery for each picture of a kitten that&#x2019;s currently on the internet, SKA will surely be money well spent.\n\ndata to overlay: \nLSST: entire sky every 3 days, 30 TB/night\nSKA: more data in its 1st year than is on the entire internet today (Alberto Conti in http://www.theatlantic.com/technology/archive/2012/04/how-big-data-is-changing-astronomy-again/255917/)\ncomparison: SDSS - 10 TB imaging data\nGaia - &#x201C;PB regime&#x201D; (Mignard et al, 07)\nDES - ~1 TB produced each night (Sevilla et al, 11)\nPanStarrs - several TB per night (website)\nLOFAR: archive up to 20 PB over 5 yrs (2012-2016) (Begeman et al, 2011)\n\n[add some more images of: GAIA, Euclid, CTA?, ....]\n\n\n
Here are just a few noted examples of such new telescopes, and perhaps you can recognise your favourite &#x201C;Big Data&#x201D; survey instrument there.\n\nAmong the most staggering examples is the LSST, which will image the entire sky every 3 days, taking astronomy in the time domain to a whole new level. The Square Kilometer Array apparently will produce more data in its 1st year than is on the internet today. I loved that statistic. If we get a new discovery for each picture of a kitten that&#x2019;s currently on the internet, SKA will surely be money well spent.\n\ndata to overlay: \nLSST: entire sky every 3 days, 30 TB/night\nSKA: more data in its 1st year than is on the entire internet today (Alberto Conti in http://www.theatlantic.com/technology/archive/2012/04/how-big-data-is-changing-astronomy-again/255917/)\ncomparison: SDSS - 10 TB imaging data\nGaia - &#x201C;PB regime&#x201D; (Mignard et al, 07)\nDES - ~1 TB produced each night (Sevilla et al, 11)\nPanStarrs - several TB per night (website)\nLOFAR: archive up to 20 PB over 5 yrs (2012-2016) (Begeman et al, 2011)\n\n[add some more images of: GAIA, Euclid, CTA?, ....]\n\n\n
While these data volumes seem like scary big numbers, we&#x2019;re helped a lot by Moore&#x2019;s Law. Storage is basically cheap. And by the time these facilities come online, it will be affordable.\n\nBut there are other challenges - mainly related to how we get the data in the right format to the users. \n\nThe scientists and engineers developing these new facilities have to think very carefully about how to transport such large data volumes, store them and distribute them. Sending terabytes of raw pixels round the world will no longer be feasible. \n\nAutomated data characterization and classification will play an increasingly important role.\n\n\n
While these data volumes seem like scary big numbers, we&#x2019;re helped a lot by Moore&#x2019;s Law. Storage is basically cheap. And by the time these facilities come online, it will be affordable.\n\nBut there are other challenges - mainly related to how we get the data in the right format to the users. \n\nThe scientists and engineers developing these new facilities have to think very carefully about how to transport such large data volumes, store them and distribute them. Sending terabytes of raw pixels round the world will no longer be feasible. \n\nAutomated data characterization and classification will play an increasingly important role.\n\n\n
While these data volumes seem like scary big numbers, we&#x2019;re helped a lot by Moore&#x2019;s Law. Storage is basically cheap. And by the time these facilities come online, it will be affordable.\n\nBut there are other challenges - mainly related to how we get the data in the right format to the users. \n\nThe scientists and engineers developing these new facilities have to think very carefully about how to transport such large data volumes, store them and distribute them. Sending terabytes of raw pixels round the world will no longer be feasible. \n\nAutomated data characterization and classification will play an increasingly important role.\n\n\n
While these data volumes seem like scary big numbers, we&#x2019;re helped a lot by Moore&#x2019;s Law. Storage is basically cheap. And by the time these facilities come online, it will be affordable.\n\nBut there are other challenges - mainly related to how we get the data in the right format to the users. \n\nThe scientists and engineers developing these new facilities have to think very carefully about how to transport such large data volumes, store them and distribute them. Sending terabytes of raw pixels round the world will no longer be feasible. \n\nAutomated data characterization and classification will play an increasingly important role.\n\n\n
\nSadly, government funding levels have not allowed there to be a kind of Moore&#x2019;s Law for Astronomers that allows our community to grow along with the size of our detectors. Combine that with the data volumes we&#x2019;ll soon be producing, and it&#x2019;s clear that most data will not be seen by humans.\n\nSo the user community of astronomers will face the difficult task of converting this data into knowledge. For this, they&#x2019;ll need different skills sets from those astronomers today typically have at their disposal. Science will necessarily become more statistical in nature. And statistical inference and machine learning will probably be more important skills to have than, say, optimal spectral extraction.\n\nFinally, an important word on this slide is &#x201C;serendipity&#x201D;. With the computing power we have at our disposal we can automate data processing and discovery to a large extent. But many discoveries in astronomy were found while looking for something else. The best known examples are the cosmic microwave background, pulsars, and dark energy.\n\nEvery time a new parameter space is opened up, we should expect to find something that was entirely unexpected. \n\nWhat is the best approach to automating of data processing and charaterization, while still making room for serendipitous discovery?\n
\nSadly, government funding levels have not allowed there to be a kind of Moore&#x2019;s Law for Astronomers that allows our community to grow along with the size of our detectors. Combine that with the data volumes we&#x2019;ll soon be producing, and it&#x2019;s clear that most data will not be seen by humans.\n\nSo the user community of astronomers will face the difficult task of converting this data into knowledge. For this, they&#x2019;ll need different skills sets from those astronomers today typically have at their disposal. Science will necessarily become more statistical in nature. And statistical inference and machine learning will probably be more important skills to have than, say, optimal spectral extraction.\n\nFinally, an important word on this slide is &#x201C;serendipity&#x201D;. With the computing power we have at our disposal we can automate data processing and discovery to a large extent. But many discoveries in astronomy were found while looking for something else. The best known examples are the cosmic microwave background, pulsars, and dark energy.\n\nEvery time a new parameter space is opened up, we should expect to find something that was entirely unexpected. \n\nWhat is the best approach to automating of data processing and charaterization, while still making room for serendipitous discovery?\n
\nSadly, government funding levels have not allowed there to be a kind of Moore&#x2019;s Law for Astronomers that allows our community to grow along with the size of our detectors. Combine that with the data volumes we&#x2019;ll soon be producing, and it&#x2019;s clear that most data will not be seen by humans.\n\nSo the user community of astronomers will face the difficult task of converting this data into knowledge. For this, they&#x2019;ll need different skills sets from those astronomers today typically have at their disposal. Science will necessarily become more statistical in nature. And statistical inference and machine learning will probably be more important skills to have than, say, optimal spectral extraction.\n\nFinally, an important word on this slide is &#x201C;serendipity&#x201D;. With the computing power we have at our disposal we can automate data processing and discovery to a large extent. But many discoveries in astronomy were found while looking for something else. The best known examples are the cosmic microwave background, pulsars, and dark energy.\n\nEvery time a new parameter space is opened up, we should expect to find something that was entirely unexpected. \n\nWhat is the best approach to automating of data processing and charaterization, while still making room for serendipitous discovery?\n
\nSadly, government funding levels have not allowed there to be a kind of Moore&#x2019;s Law for Astronomers that allows our community to grow along with the size of our detectors. Combine that with the data volumes we&#x2019;ll soon be producing, and it&#x2019;s clear that most data will not be seen by humans.\n\nSo the user community of astronomers will face the difficult task of converting this data into knowledge. For this, they&#x2019;ll need different skills sets from those astronomers today typically have at their disposal. Science will necessarily become more statistical in nature. And statistical inference and machine learning will probably be more important skills to have than, say, optimal spectral extraction.\n\nFinally, an important word on this slide is &#x201C;serendipity&#x201D;. With the computing power we have at our disposal we can automate data processing and discovery to a large extent. But many discoveries in astronomy were found while looking for something else. The best known examples are the cosmic microwave background, pulsars, and dark energy.\n\nEvery time a new parameter space is opened up, we should expect to find something that was entirely unexpected. \n\nWhat is the best approach to automating of data processing and charaterization, while still making room for serendipitous discovery?\n
Data management is not a sexy topic. But getting it right matters, because getting it right pays off. Experience with our current facilities has made that abundantly clear.\n\nWhen we look at the publications that use data from the Hubble Space Telescope, arguably one of the first examples of a large public easily accessible archive, we can see that since 2006, the number of papers using archival data (ie none of the investigators on the proposal were among the authors) has exceeded those where the investigators were among the authors.\n\nWe see similar numbers for e.g. Chandra.\n\nThe Sloan Digital Sky Survey, one of the most productive surveys known to date, is another great example of how a good data management strategy can vastly increase the productivity of an observatory or instrument.\n\n
A final note on data.\n\nData is not just images, spectra and catalogues. \n\nHow about the literature? We are told every day that publishing and being cited is the key to success. This has given us a culture of publishing as much as we can manage. As a consequence, the volume of literature that is produced by the community and posted daily to the Arxiv exceeds what an astronomer can track. Let alone read. Let alone digest.\n\nPapers too are data. How do we manage those? How do we find and extract the information we need, and how do we assign value?\n\nThese are also important challenges.\n\n
There are actually a few conferences here dedicated to these topics with interesting talks by the people who are working to tackle these problems right now - so if you&#x2019;re interested in this proble you should definitely attend these.\n\n
Public data archives have undoubtedly had a big impact on scientific productivity - but that&#x2019;s not the only place where they&#x2019;ve made a splash.\n\nHaving our data publicly on display on the internet has made astronomy incredibly successful in public engagement. Imagery has found its way onto the front pages of newspapers, in art, inspired amateur astrophotographers etc. \n\nI always like showing these pictures - it&#x2019;s one of the 2011 collections by a young Scottish designer called Christopher Kane who&#x2019;s something of a darling in British fashion these days. The images used in this Galaxy collection are all from HST. So we have all these fashion fans out there who will spend hundreds of euros to have, you know, the Orion nebula on your chest.\n\nI doubt anyone had this in mind when creating these huge archives we have of Hubble imagery. But it goes to show how one person&#x2019;s science data are another&#x2019;s artistic inspiration - in places where you might not have expected it.\n
A big unifying theme to all these developments, and really the technology that underpins them, is the internet. \n\nThe internet has allowed us to send data around the world in an amazingly efficient way - from observatories to data centres, from data centres to our desktops, from users to users. \n\nThe internet also makes it easy to make data available to those outside the user community, to bring astronomy to a wider public.\n\nAnd let&#x2019;s not ignore social networking tools! These are commonly equated with &#x201C;time-wasting&#x201D; in a professional context. But that&#x2019;s really not the case. Yes, you can use them to discuss what your dog just did and what you had for breakfast, but they are also immensely useful channels for communication with people you might otherwise not get to talk to: astronomers who are far more junior or senior that you are, in far away places, scientists in other fields, or\n
\n
\n
\n
The point I&#x2019;m trying to make in this slightly long-winded preamble is that astrophysics is transitioning into a new kind of science that is driven by the wide and public availability of large amounts of data over networks, and the internet.\n\nSo in the first decades of this new century, with all these new and exciting large facilities coming our way, we&#x2019;re having to change our colours in order to adapt to a new environment - going from a still fairly hands-on approach to data gathering, with telescope proposals, observing trips and so on, to one where the skill lies in clever approaches to data mining. This requires different skills to the ones we were taught, so in a sense this is a challenge we&#x2019;re having to grow up to.\n\nThis evolution forms the backdrop to some of the things I&#x2019;ve been involved in in recent years, and for the rest of the talk I&#x2019;ll give a couple of examples of how, against this backdrop of sociological change, I&#x2019;ve used the web. \n\n\n
The first thing I want to talk about is a conference I organise with some friends/fellow scientists - called dotAstronomy. The original dotAstronomy took place in Cardiff in 2008, which I attended as a participant. There was such a good vibe to the meeting that we formed a small group to organise further conferences. We&#x2019;re now up to number 4, which will take place in Heidelberg next week.\n
The idea behind dotAstronomy is that, in the face of this paradigm shift in science, many young scientists have these side projects they work on - a cool outreach project perhaps, they write a blog (that was my entry point), they have developed a neat software application, they produce videos..... But these things don&#x2019;t form part of their work, and they do these things in isolation. \n\nSo with dotAstronomy we want to bring together the people who have these skills - be it artistic or programming, or just bright ideas. We get them to show off their work, share ideas, and crucially, to work together.\n
The central part of the conference, and the reason that many of our participants want to come, is the hack day. This is a free-form day where people can get together in groups, pool together their expertise and work on existing or new projects.\n\nSome of these projects are artistic - we&#x2019;ve had videos created and a project for data sonification - others develop literature mining tools, e.g. to extract tables and figures from papers; others still have worked on novel data interfaces for citizen science projects.\n\nWe&#x2019;re not the only ones to organise such hack days by the way. The Guardian newspaper has sponsored a Science Hack Day in London, and recently there was also an NHS Hack Day, looking specifically at innovative web-based approaches to dealing with healthcare data.\n
dotAstronomy has become fairly well known now in the community - and I often get asked how we make the conference a success. We use quite a different format to most conferences, and this is a big key to its success.\n\nsenior -> big picture + presenting the problems; junior -> skills + inspire the seniors (who hire them!)\n\n\n
Here are some of our past speakers, and speakers for the upcoming conference. We&#x2019;ve had involvement from several companies, such as Google, Microsoft and O&#x2019;Reilly Media, and we&#x2019;ve been pretty successful in attracting sponsorship from major organisations and funding bodies - shown here.\n
In summary, with .Astronomy we&#x2019;re trying to create enthusiasm for technology and for data.\n\nWe want to highlight the potential that there is in people and in infrastructure (i.e. the internet)\n\nWe want to build a &#x201C;grass-roots&#x201D; community of people who have the skills to exploit this new data paradigm in astronomy, the skills that are currently perhaps untapped - creative, software etc.\n
\n
\n
\n
\n
One of the first web-based astronomy citizen science project was called Galaxy Zoo, which is quite well known now. It was started at Oxford by Chris Lintott and Kevin Schawinski. In GZ, users were asked to classify a galaxy from images of the SDSS as spiral or elliptical. A very simple task for us humans, a very difficult one to do reliably with a computer algorithm.\n\nFollowing on from the success of the original GZ, an official organisation was created, called the Zooniverse, for setting up further projects. There are now of order 10 projects spanning astronomy, marine science and archeology - see the logos above. The big strength of Zooniverse is that they hired professional experienced web developers, who created a very flexible API that now provides an easy to use interface for setting up new projects. New image sets can easily be fed into a site and shown to the users.\n\nAnd just to reiterate, all these projects directly benefit from large datasets that are made publicly available on the web.\n
One of the first web-based astronomy citizen science project was called Galaxy Zoo, which is quite well known now. It was started at Oxford by Chris Lintott and Kevin Schawinski. In GZ, users were asked to classify a galaxy from images of the SDSS as spiral or elliptical. A very simple task for us humans, a very difficult one to do reliably with a computer algorithm.\n\nFollowing on from the success of the original GZ, an official organisation was created, called the Zooniverse, for setting up further projects. There are now of order 10 projects spanning astronomy, marine science and archeology - see the logos above. The big strength of Zooniverse is that they hired professional experienced web developers, who created a very flexible API that now provides an easy to use interface for setting up new projects. New image sets can easily be fed into a site and shown to the users.\n\nAnd just to reiterate, all these projects directly benefit from large datasets that are made publicly available on the web.\n
One of the first web-based astronomy citizen science project was called Galaxy Zoo, which is quite well known now. It was started at Oxford by Chris Lintott and Kevin Schawinski. In GZ, users were asked to classify a galaxy from images of the SDSS as spiral or elliptical. A very simple task for us humans, a very difficult one to do reliably with a computer algorithm.\n\nFollowing on from the success of the original GZ, an official organisation was created, called the Zooniverse, for setting up further projects. There are now of order 10 projects spanning astronomy, marine science and archeology - see the logos above. The big strength of Zooniverse is that they hired professional experienced web developers, who created a very flexible API that now provides an easy to use interface for setting up new projects. New image sets can easily be fed into a site and shown to the users.\n\nAnd just to reiterate, all these projects directly benefit from large datasets that are made publicly available on the web.\n
Enthusiasm for citizen science projects and for GZ in particular was hugely stimulated by the discovery of an interesting object by a young Dutch schoolteacher, Hanny van Arkel - called Hanny&#x2019;s Voorwerp - shown here (in a Hubble image rather than SDSS). Hanny posted a question about this funny looking green object (green from ionised oxygen emission) when classifying the galaxy - and this essentially led to a lot of follow-up work and Hanny&#x2019;s name on several papers.\n\nIn fact, I suspect Hanny van Arkel has a higher h-index than I do....\n\nThis was a fantastic success story that really demonstrated the potential of such projects. It gave the science team the opportunity to go &#x201C;Huh, we don&#x2019;t know what that is. Let&#x2019;s see how we can find out&#x201D; - and go through this process with the whole GZ user community watching.\n\n\n
The success of the first Galaxy Zoo project was really staggering. In this plot you can see the number of user registered as a function of time since the project launch. In fact, if you were to zoom in to the initial days of the launch, you&#x2019;d be able to see a sharp dip a few hours after launch, when the servers went down under the traffic.\n\n\n
\n
\n
\n
\n
\n
To illustrate, here&#x2019;s what the project I work on looks like: \n\n[describe the interface and teh tasks users perform]\n\n[describe briefly that then have to process all the user input data into a catalogue of bubbles]\n\n-> when many users draw the same bubble, we get statistics on the size and shape, and that&#x2019;s how we perform quality control.\n
To illustrate, here&#x2019;s what the project I work on looks like: \n\n[describe the interface and teh tasks users perform]\n\n[describe briefly that then have to process all the user input data into a catalogue of bubbles]\n\n-> when many users draw the same bubble, we get statistics on the size and shape, and that&#x2019;s how we perform quality control.\n
Here are just some statistics from the Milky Way Project. These numbers quoted here were correct as of the end of 2011, so we&#x2019;ve surpassed these by some margin now - on the bubble drawings were up to nearly 800,000 now. Our users come from 178 countries and the pie chart shows the major contributors.\n\nIncidentally, if 800,000 sounds like a large number, you should know that the original GZ project gathered 50 million classifications in its first year!\n\nWe&#x2019;ve also had two papers accepted for publication - one presenting the initial catalogue of over 5000 bubbles, and one describing a large statistical study of bubbles and other markers of massive star formation, that I myself led.\n\nI won&#x2019;t concentrate too much on the science behind the project, but focus more on the sociological aspect of this and other similar projects.\n\nThinking about 35000 people helping you classify your data makes me feel slightly megalomaniac sometimes, and like I said there are some ethical implications of taking this approach to science. From its earliest days, the Zooniverse organisation has created some ethical rules that they use to vet new projects.\n
What these projects all have in common is that, to put it in the crudest possible way, they use human brains as an instrument. We ask users to perform one step in our data processing that is not difficult, not time consuming for any given image, but given the large data volumes, incredibly work-intensive. This \n\nTo be clear, &#x201C;outsourcing&#x201D; your problem to citizen scientists does not answer any scientific questions in itself - you just transform one large dataset into another large dataset, only the 2nd large dataset is easier to process with computers.\n\nSo the challenge becomes how to benchmark and analyse the user data you receive. But if you get many classifications of the same object from many users, the big advantage is that you immediately have statistics on your data.\n\nHowever, this somewhat blunt view raises an important ethical question\n
1. don&#x2019;t waste people&#x2019;s time:\ndon&#x2019;t ask ppl to do more than you strictly need them to do\nwhen you&#x2019;ve classified your data, stop\n\n2. Treat participants as collaborators\nThis comes back to what I mentioned before. From a purely scientific point of view you need a certain detachment to get science out of your citizen data. But as human scientists we need to have consideration for the fact that people are giving up their time to help us out.\n\n3. This ties in with not wasting people&#x2019;s time: if an algorithm can do it better, don&#x2019;t ask people to do it for you just for the sake of it. Not just outreach.\n\nNew projects will only be considered by the Zooniverse if these rules apply. So really, the bottom line is that you should respect the users as people.\n
The combination of a citizen science project with web 2.0 functionality makes it ideally suited for &#x201C;outreach&#x201D; - although I&#x2019;d rather use the word &#x201C;engagement&#x201D;. As I said before, the whole point of web 2.0 is two-way communication, rather than just posting information up there.\n\nMany of our users, especially those who spend lots of time on the site, are super keen to learn more about what it is they&#x2019;re doing. And that presents a huge opportunity for working with these people and helping them understand more, and get more involved.\n\nOur main space for engagement for MWP is the forum, which all Zooniverse projects have. In MWP, users can collect images they like, for aesthetic purposes or for some scientific reason. They can discuss what can be seen with others etc.\n
We also have a forum, where they can post images and simply ask questions. \n\n.. and a blog, where we keep people updated with the latest developments in the project, papers we&#x2019;re writing, or background information on the science.\n\nIn some of these projects, the level of proficiency that some users have worked up to with the help of the science team is amazing. In the Galaxy Zoo forum, users had become sufficiently acquainted with the data to get the identifier of a Galaxy they were classifying and looking up the spectra on the SDSS webpages. They knew to look for emission lines and they knew what that meant scientifically. \n
And this is something I really enjoy about working with the Zooniverse team on these projects. In terms of public engagement, these projects are entirely focused on the participants. Traditional outreach projects, or public webpages, tend to just present the reader with information - what is X, how does Y work?\n\nBut in the Zooniverse projects, we don&#x2019;t just say &#x201C;this is science&#x201D; - we try to show how it works. &#x201C;This is how we do science&#x201D;.\n\nIt&#x2019;s about bringing science down to Earth, showing them how we do things. And I always enjoy sharing how incredibly banal even astrophysics can be.\n\nIn a broader context, incidentally, I think many of societal challenges facing us today - disease, climate change, energy... - are scientific in nature. But many people just don&#x2019;t have a feel for what that means, when scientists are reported to say X or Y by the media. So if we can instill in people an appreciation for how science works, the work and the processes that go into scientific discovery, even in a &#x201C;trivial&#x201D; field like astrophysics, this can have a lot of benefit.\n
And this is something I really enjoy about working with the Zooniverse team on these projects. In terms of public engagement, these projects are entirely focused on the participants. Traditional outreach projects, or public webpages, tend to just present the reader with information - what is X, how does Y work?\n\nBut in the Zooniverse projects, we don&#x2019;t just say &#x201C;this is science&#x201D; - we try to show how it works. &#x201C;This is how we do science&#x201D;.\n\nIt&#x2019;s about bringing science down to Earth, showing them how we do things. And I always enjoy sharing how incredibly banal even astrophysics can be.\n\nIn a broader context, incidentally, I think many of societal challenges facing us today - disease, climate change, energy... - are scientific in nature. But many people just don&#x2019;t have a feel for what that means, when scientists are reported to say X or Y by the media. So if we can instill in people an appreciation for how science works, the work and the processes that go into scientific discovery, even in a &#x201C;trivial&#x201D; field like astrophysics, this can have a lot of benefit.\n
In the MWP we have a large &#x201C;what is this?&#x201D; potential. Anyone who&#x2019;s worked with these beautiful galactic plane images from Spitzer knows that they&#x2019;re full of interesting things: filaments, bright spots, dark clouds, shells, clusters..... Figuring out what everything is and how it spatially fits together is hugely complex.\n\nSo on the MWP forums the vast majority of the questions are of the type &#x201C;what is this red thing in the top left corner?&#x201D;. And I never know either!\n\n\n\n
In answering these questions, I found myself going through these same banal steps - getting the coordinates of the object, doing a SIMBAD search, skimming a paper or two, and writing a short paragraph on what I think the object of interest is.\n\nSo now we&#x2019;ve started introducing new features in the interface that allow the more interested and dedicated users to do this themselves - better coordinate overlays, and we wrote something about how to use SIMBAD. I&#x2019;m planning to spend a bit more time over the summer working on this.\n
Talk about Suelaine and the LBVs and supernova remnants.\n\nIn all this, it&#x2019;s really important that scientists talk to the users. Like in all relationships, communication is the key! You cant expect users to learn all this stuff by themselves, or that they will just keep each other entertained.\n
So here are some lessons I&#x2019;ve learnt from my involvement in this project.\n\n1. There&#x2019;s a large community of people who are keen to contribute to scientific research. That is a hugely valuable resource that we should tap into.\n\n2. People don&#x2019;t engage with webpages. The web is a medium, not a thing in itself. So to achieve this 2-way communication with a large audience requires investment of time by scientists.\n\n3. When citizen science is done well, it&#x2019;s clear that citizens can play a key role in data discovery and characterisation. \n
so what does this mean about the fundamental nature of the astronomer in the 21st century? maybe not so much. \n\nbut I think that part of our adaptation to the changing environment should include a change in mindset from &#x201C;us v them&#x201D; to a more inclusive approach to public engagement. we have amazing tools available via the web, huge amounts of data that the astronomy community *probably* doesn&#x2019;t have the resources to examine, and lots of people who are keen to contribute.\n\ngetting citizen scientists involved can not only help with data discovery in the era of big data, but has significant mutual benefits to the professional and amateur communities involved. we would do well to embrace that.\n