After spending last week fighting with commercial software and then re-writing an entire computational lab around bugs in said software, I noticed something odd about my blog posting: My blinking statistics are starting to look like they have a power law distribution.
Imagine that someone posts every weekday morning to their blog. That person would have an interesting distribution of “waiting times” between blog posts. There’d be a cluster of waiting times at the 1 day mark, and then a much smaller cluster at the 3 day mark. Since no one posts at exactly the same time every day, there’d be spread around those times, and a plot like the one to the right might describe well the blogger’s waiting time distribution (or their “blinking statistics”). For those of you with some statistics background, I’ve assumed in this plot that the waiting times for your average blogger would follow a Gamma distribution.
We all know what happens when a blogger is on a roll or when there’s been a big news day; posts come out rapidly, and the waiting time distribution suddenly gets a lot of contributions at the short time end. But what happens when real life intervenes? That is, what happens when a blogger’s regular job makes long-term demands, or when someone has a child, or when someone has to spend a week rewriting a computer lab around bugs in some annoying commercial software… An extreme case of this would be when a blogger suddenly leaves the internet (or passes away). Suddenly the waiting time distribution gets a very long tail.
Clay Shirky has written about power law distributions of inbound blog links, Jason Kottke worked with data from technorati and shown that for the top 100 blogs, a power law distribution of links is a reasonable assumption, and I’ve mentioned it when discussing asymmetric networks.
I don’t know if anyone has ever looked at the waiting time distribution for blog posts before. I’d be curious if the blogosphere has LÃ©vy statistics or a Pareto distribution. We may not have the dynamic range to tell yet. Blogs have only been around for a few years, and seeing a difference between the long-tail distributions can take many decades of decay.
[tags]statistics, waiting times[/tags]