Bad modelling

A flawed analysis of trends in the number of blogs, and the increase in the number of spam blogs, has lead people to think that the number of non-spam blogs is falling. In fact the growth trends remain strong.
So where has the analysis gone wrong? It is in the assumption that the percentage rate of increase in a percentage is going to remain constant.

Why this is wrong is easy to see, if you extend the forecast shown in this graph for another month (to June 2005). It would then show that by June 2006 101% of all blogs will be spam – in other words there would be more spam blogs, than all blogs including spam blogs. Something wrong there I think!

A better way to analyse this would be to break it down into two growth trends. One for spam blogs, the other for non-spam. Subtracting the number of spam blogs from the total gives us 7.8m non-spam blogs in March 2005 and 16m in October and 21m in January 2006.

Of course non-spam blogs are growing much faster, at around 43% a month compared with 10% a month for non-spam. This is still rather depressing as the number of spam blogs will (if the trend continues) overtake the number of non-spam blogs in June 2006, when there will be 34m non-spam blogs and 43m spam blogs.

However the very high growth rate of spam blogs is likely to shrink (very high growth rates always do!), so this is a pessimistic forecast.

There is absolutely no reason to think that the real (non-spam) blogsphere is going to shrink.

2 thoughts on “Bad modelling

  1. Graeme,

    Of course you’re right, when you combine these two trends it’s obvious that something is wrong. That was my point although folks seemed to just jump on the obviously wrong conclusion – the non-soma blogs are declining at an alarming rate.

    Dave Sifry posted a comment on my blog stating that his numbers already exclude spam blogs, yet the Umbria report presents their numbers as a percentage of total blogs and then cites Sifry’s numbers for the total blog number. There are lots of unanswered questions here – but I didn’t think anyone would jump the conclution that by mid year there would be nothing but spam blogs.

    Oh, well. Perhaps sarcasm is truely a lost art.

    I do appreciate you point out to others what was obvious to me – these trends don’t make sense when you hold them up to the light together.

    -Matt

  2. Most of the comments on your blog and links to it (such as the one I followed from Web Log Tools Collection) take your analysis completely seriously. If you are being sarcastic you I would say you need to make it clear.

Comments are closed.