Do data caps punish the wrong users?

Datacaps-thumbIn late 2009, Herman Wagter and myself wrote an article entitled “Is the ‘bandwidth hog’ a myth?” postulating that the way most ISPs were looking at their users’ consumption patterns was misaligned with what really happens in access networks. In particular, we suggested that the limitations imposed on heavy users (the so-called “bandwidth hogs”) by ISPs hoping to alleviate congestion were unlikely to work because the ISPs’ worldview confused data consumption and bandwidth usage, i.e. how much data was downloaded over a whole period with how much bandwidth capacity was used at any given point in time.

The piece ended with a request for honest-minded ISPs to submit usage data from their network for analysis, and a data setwas specified. A number of ISPs responded that they were keen to collaborate, but many of them didn’t have access to the data that would have made the analysis possible. Finally, one such ISP, a mid-size company from North America, agreed to share a data set for analysis. We have finally published the results from that study in a Diffraction Analysis report entitled Do data caps punish the wrong users? A bandwidth usage reality check. That report is for sale, but as promised I reproduce the executive summary here.

In the last couple of years, the considerable growth in internet traffic has pushed an increasing number of internet serviceproviders (ISPs) around the world to implement strategies to limit the usage of broadband services by their customers. Most of these strategies revolve around data caps: a level of monthly data consumption that triggers pay-as-you-go mechanisms at steep per megabyte rates.

Our thesis was that most of these strategies are implemented without an accurate understanding of the customers’ real time usage patterns, and as a result such strategies are neither accurate in targeting disruptive users, if they exist, nor fair to users who may consume a lot of data overall but not in a disruptive way. Further, our analysis aimed at assessing whether a very small number of users could indeed be considered to degrade quality for all other users.

In order to investigate these issues, we took real user data for all the broadband customers connected to a single aggregation link and analyzed the network statistics on data consumption in five-minute time increments over a whole day.The data was shared by an ISP in North America who wanted to understand its own network usage. Our analysis tracked both data consumption (i.e. total MB downloaded) and bandwidth usage (i.e. Mbps being used).

Our analysis confirms that data consumption is at best a poor proxy for bandwidth usage:

  • The top 1% of data consumers (hereafter Very Heavy consumers) account for 20% of the overall consumption.
  • Average data consumption over the period is 290 MB, while consumption for Very Heavy consumers is 9.6 GB. Thisroughly equates to data consumption of 8.7 GB and 288 GB per month, respectively.
  • However, only half of these Very Heavy consumers are customers of the highest service tier (6 Mbps), which implies that half of them have bandwidth usage restricted to 3Mbps (the next service tier) or lower.
  • 61% of Very Heavy data consumers download 95% of the time or more, but only 5% of those who download at least 95% of the time are Very Heavy data consumers.
  • While 83% of Very Heavy data consumers are amongst the top 1% of bandwidth users during at least one five minute time window at peak hours, they only represent 14.3% of said Top 1% of users at those times.

Bandwidth usage outside of periods when the aggregation link is heavily loaded (which we arbitrarily set at 75% load for this study) has no impact on costs or other users. Therefore our analysis of bandwidth usage focused on the three hours in the day where the link was loaded above 75%.The results show that while the number of active users does not vary significantly between 8 AM and 1 AM, the average bandwidth usage does vary significantly, especially around late afternoon and evening. This suggests that the increase in the aggregation link load is not a result of more customers connecting at a given time, but a result of customers having a more intensive use of their connections during these hours.

An analysis of customers contributing to peak bandwidth usage yielded some interesting results:

The proportion of bandwidth allocated to Very Heavy data consumers diminishes when the aggregation link load isabove 75%. While this suggests a fairer resource allocation during peak times, the link was never loaded enough in our data set to assess whether or not that resource allocation continues to be fair when there are no more resources to allocate (95% load or higher).

42% of all customers (and nearly 48% of active customers) are amongst the top 10% of bandwidth users at onepoint or another during peak hours.

6% of all customers (and 7.5% of active customers) are amongst the top 1% of bandwidth users at one point or another during peak hours.

Assuming that if disruptive users exist (which, as mentioned above we could not prove) they would be amongst those that populate the top 1% of bandwidth users during peak periods. To test this theory, we crossed that population with users that are over cap (simulating AT&T’s established data caps) and found out that only 78% of customers over cap are amongst the top 1%, which means that one fifth of customers being punished by the data cap policy cannot possibly be considered to be disruptive (even assuming that the remaining four fifths are).

Data caps, therefore, are a very crude and unfair tool when it comes to targeting potentially disruptive users. The correlation between real-time bandwidth usage and data downloaded over time is weak and the net cast by data caps captures users that cannot possibly be responsible for congestion. Furthermore, many users who are "as guilty" as the ones who are over cap (again, if there is such a thing as a disruptive user) are not captured by that same net.

In conclusion, we state that policies honestly implemented to reduce bandwidth usage during peak hours should be based on better understanding of real usage patterns and should only consider customers’ behavior during these hours; their behavior when the link isn’t loaded cannot possibly impact other users’ experience or increase aggregation costs. Furthermore, data caps as currently implemented may act as deterrents for all users at all times, but can also spur customers to look for fairer offerings in competitive markets.