Presence in News? Revenue Opportunity? Impact on World? ... let's talk numbers. It will be pretty boring, numbers oriented. I want to show you that digging after numbers pays off in the end, and present you my own path as a case study. I will assume you know the basics – if you ever saw wordpress or blogger, or if you have a web site, you should be ok. Back when economy was invented, doing business was easy – you created a pot and you exchanged it for wine. Then we invented factories and supermarkets, and suddenly you had to predict the supply and demand, to enable your business to operate smoothly. And then we invented computers, and suddenly everything became countable and measurable. And then someone invented the web and forgot to build into it everything that we've learned in history.
So today, when someone asks, how big is the web?, the usual answer is is: it's big! And the same for everything else online. The fact is, measuring stuff online is often pretty darn hard. If you have your own site, the amount of data can be overwhelming, and if you want to know something about other peoples' sites, there is no really legal way to do it. Important to know: to project your business I'll be talking about many numbers that were never published before, because nobody wants to hear them.
Lot's of people are working with bloggers, all of them should know Every blogger blogs somewhere Researchers exist Google sees everything But each of them has their own interests, and none are to make your projections accurate.
Of 12 top blogging platforms, these are the only ones who published their stats publicly. You can see how diverse they are, but believable, right? We have 150M of something right here, so there must be 300M altogether, right? Bloggers, publishers, people, journals, users.
top-down
Bottom-up The most famous one is probably Technorati, then there are other, each with a different approach and purpose. BlogCatalog is user-generated general catalogue, Alltop is editorial list of best readings for wide range of subjects, Federated media is a list of absolute top influencers, and so forth. I was interested in finding out the total reach of these catalogs. I wanted to see, if they can provide with the comprehensive list of 'wine bloggers'. So I crawled and indexed publicly published information from 14 different catalogs . I don't know if their internal indexes are different, my Uber-list has only the blogs I could find if I was a normal marketer browsing their sites. For technical reasons I didn't include Yahoo and DMOZ.
This chart is rather complex, but essential. It demonstrates two findings: - the bubble sizes are related to catalogs unique contribution to our combined index - number of blogs that were found only in the respective catalog. as you can see, the bubbles are larger, the higher they are, which makes sense - larger catalogs contribute more new blogs - the horizontal axis shows the percentage of unique blogs in each catalog. this tells us how special the respective catalogue is, how much is it worth to look at it. So we can see that blogcatalog contributes almost 90% of it's volume and technorati almost 80%. But truly surprising finding is, that even the rest of the catalogues list more than 60% of new blogs, and that even the absolutely smallest catalogs, that never inteded to be reference for the blogosphere, like loud3r, still contribute 30% of their size. This basically means, that if you create your own list without looking at other catalogs, chances are 30% of the list will not be listed anywhere else. Let me repeat this: if you throw a rock at a blogger, there are reasonable chances you'll be the first. - Now, for the second finding let's look at the vertical axis - the scale ends at 200k. This means, that the largest catalog out there publishes merely 200k bloggers, and that all of them combined see only 200k bloggers.
Professionals estimate anything between 200k and 200M, and the enablers are aligning on the upper margin. We are on our own.
Traffic is most reliable and most measured metric online Compete makes it possible to measure traffic on other peoples domains, by sampling from browser plugins. It turns out we can, because we have one very special platform in the ecosystem.
blogger.com publishes everything on blogspot.com, and has all dashboards on blogger.com this means, that unique visitors on first are readers and on second are bloggers! If we assume this ratio stays fixed, we can estimate sizes of every other platform in the ecosystem!
So now we can estimate sizes of several platforms. But not all of them, and we can't know which we are missing. Two problems: Useful for estimating reach of hosted platforms. Self-hosted remain mystery. Missing custom domains. Google sees everyone, so if there is a way to get data from them...
Now, we should cross-check the previous numbers, and maybe tap into custom domains and self-hosted platforms. Random sample of blog posts in english Heuristics to guess which they are. This is pretty reliable sampling.