Return of the Hog Conundrum

2593363628_45ce5c462e_z

Photo by nutmeg66

This morning, Ars Technica publishes a really interesting article on the cost structure underlying wireline broadband service and its sensitivity to data usage entitled Should broadband data hogs pay more? ISP economics say "no".This article looks at financials publicly reported by cable operator Time Warner and concludes that despite usage growth, broadband service is profitable, and even increasingly so.

Mostly, that gels with my own experience in the matter and with discussions I've had – off the record – with many telcos. Of course, there's an underlying assumption in the conclusion of the article which is that pricing that isn't based on cost is immoral. That's, of course, a matter of opinion but certainly not standard practice in many, if not most industries in our western world.

The theoretical mechanism that pushes prices towards cost is competition. In a competitive broadband market, a player who starts capping or introducing variable pricing without having a cost incentive to (which is what Ars Technica suggests TWC is doing) would end up losing market share to players who have the same fixed cost structure and don't introduce variable pricing. In a non-competitive market like the duopoly that most of the US lives with, that's no longer true because the single competitor has a strong incentive to increase its own margins in order to maintain the pretence of competition (and avoid regulatory wrath) as opposed to gaining market share.

Still, I like Ars' approach of trying to get broadband providers to put their money where their mouth is. The claim (by telcos and cablecos) that hogs are harmful is still mostly unsubstantiated. If you remember, back in December, I even questioned the existence of bandwidth hogs (in the sense of users who disrupt the normal flow of data in the network) and offered to perform some analysis to prove or disprove the point. I asked for data from ISPs, and one of them came through.

After a few months of discussing technical specs for the data and establishing the conditions of my use of said data, we agreed on everything and on June 15th, I received a data set that was close to what I specified back in December. In some ways, the dataset is more detailed, and in some ways it's less detailed. Let me explain:

  • it's more detailed because instead of the 2-hour intervals I originally requested, the ISP I work with suggested I should use 5-minute intervals. The variation in load is so quick, they said, that 2-hour intervals would completely drown the peaks.
  • it's less detailed because instead of having 3 months worth of data as I originally requested, I have a single day. This was mostly due to processing time on the ISP side, and I was a little dissapointed, but having played with the dataset for a few weeks I now realise that this was a blessing in disguise, even if it suggests that whatever analysis comes out of this will need to be extended.

I got hold of the data set mid-June, and I started working on it in SPSS immediately. Remember that I'm doing this in my own free time, so this meant several evenings simply to import the data into SPSS in the correct formats. The most painful part of the operation was that each time stamped 5-minute interval is a data point for each customer, upload and download, which resulted ultimately in 576 variables for each customer, with no way in SPSS to label them otherwise than individually. Painful. And imagine my desparation if I'd had not a single day but even a week's worth of data…

I quickly realised that if I wanted to perform the right level of analysis, SPSS wasn't the right tool. I don't pretend to be an absolute expert at SPSS and there might be things I could have streamlined, but the sheer number of variables made it very difficult to derive anything meaningful with the tools at hand.

What I did then was to turn to a company who is – as far as I know – the best at extracting and churning data. The company is called Squid Solutions, and they have a track record of processing datasets way more complex than what I had. I've worked with them in the past, and thought they might be willing to help me. I met with them last week, and they agreed to collaborate on this project. The data set was sent to them, and I wrote up what I expected in terms of output.

So when I read this article in Ars Technica this morning I thought this would be a good opportunity to keep you informed on the recent developments and to state loud and clear that the bandwidth hog, mythical or not, in't dead.

I'm expecting processed results towards the end of August or early September, so if all goes as planned, expect to see some results sometime in September.