As you probably know by now, I've been spending a lot of my evenings in recent weeks working on analysing an actual dataset of last mile usage. I'm not quite there yet, but I'm hopeful that the results will be in by the end of the month and highlight some really interesting trends.
In the process of doing this analysis though, I've gotten interested in and have talked to people who are running a variety of speed test measurements. i've also become drawn into a kind of a twitter argument around the recent FTTH speed tests that were undertaken by DegroupNews in France. Let me tell you this story in one paragraph before I get to the core point of my post.
What happened was that someone on twitter posted the results, asking why the broadband delivered was so bad compared to nominal (off the top of my head, Free and SFR delivered a little above half of nominal, and Orange and Numéricable were around a third of nominal). An executive from one of these service providers answered saying that the measurement methodology was crap. I then asked them to publish real numbers if they were contested these numbers, and while I'm not optimistic that they will, I can't really blame anyone for believing what numbers are out there. I'm not optimistic that any of the big guys in France would share real numbers publicly, but you never know…
However, one of the side effects of this conversation and some other conversations I've had was to look into available speedtest services and whatever information was available on their methodologies. A friend suggested I should look into the methodology of Ookla, one of the speed measurement services of http://www.broadband.gov. You can find the details here (http://www.broadband.gov/about_ookla.html).
In here, there's a fascinating nugget. It's the last bullet on the methodology:
"Samples are sorted by speed, and the fastest half is averaged to eliminate anomalies and determine the result."
In case this isn't immediately evident, it's a statisical aberration. A survey house that said: "we rank respondants, drop the bottom half and average the rest" would be out of business in a minute. There are proven ways to eliminate outliers in any data set, and believe me, this is not one of them. So I'm left to wonder. First of all, what's Ookla's benefit in artifially (but dramatically) skewing broadband speed measurement results? What's in it for them to get people to believe that their speed is much higher than it actually is ?
The second question, which is perhaps more important, is why is the US government offering an openly skewed tool to end-users, but doing it transparently ?
I'm really puzzled. If anyone has any idea, let me know!