Sunday, December 4, 2011

Evaluating RC4 as pseudo-random number generator

Update (September 3, 2012): I made some mistakes in my own implementation of RC4. An analysis of the mistake is documented in Mistake in my RC4 implementation. After fixing the mistake, I do not see the skewed distribution shown here. Please be advised that none of the blog posts in this blog are peer reviewed, and mistakes happen.
On the quest to look for a pseudo-random number generator that is easy to implement, I gave up on Mersenne Twister but took Wikipedia's advice that a stream cipher can be used for generating pseudo-random sequences. But it also mentions that RC4 is not a particularly uniform stream cipher which makes statistical attacks possible. Because of the simplicity of RC4, I decided to implement it and see really how bad it is. Since RC4 outputs a stream of bytes, I took the histogram of the byte value. The initialization key is the integer value of a stack-allocated object pointer, repeated for 64 bytes, and it's run on a 64-bit machine. I'm not sure if the fact that the key is highly regular has anything to do with the following non-uniform result that I observed.
bytecountbytecountbytecountbytecount bytecountbytecountbytecountbytecount
05499938 3250376 6481740 96253697 12850004 160144433 192147856 224147031
1134854 3350103 6550386 97115718 12950024 16149930 193149593 22550260
250352 3450179 66954060 9850362 13050185 16250553 194134910 22678249
3115332 3550006 6750091 9950170 13150543 16350289 19550516 22749853
450214 3650065 6850256 10050052 13250288 16449950 19650532 22850303
550089 3750589 6950173 10150118 133143783 16549753 19750182 22950135
6207241 3849946 7050416 10250395 13450447 16650270 19850145 230134634
768820 3950147 7149985 10349774 13550252 16750502 19949368 23149987
850212 40103906 7249950 10450501 13649803 16850038 20050032 23250486
9695154 4149905 7350137 10550389 13750085 16950619 20166006 23349849
1050439 4249865 7450472 10650284 13849508 17050045 20250336 23450083
1149682 43948555 7550141 10750230 13950294 17150719 20350300 23550267
1250378 4449965 76140698 10850119 14049772 17250399 20449963 23650117
1350466 4550171 7749814 10950012 14150385 17350031 20550072 23749945
1449591 4650290 7850219 11090366 14250177 17450126 20650585 23850157
1549776 4750340 7949916 11149974 14350224 17550513 20750350 23950234
16287052 48205931 80151232 112131863 14450079 17650026 20850226 240131317
1750022 4950076 8150134 11349801 14550487 17750688 20981920 24150161
18118991 5050125 8250108 11450370 14666545 17878182 21049998 24250092
1950209 5150706 8350292 11550297 14750236 17950411 21150458 24350180
2050170 5250025 84221044 116116139 148247793 180215784 21250185 24450124
2150371 5349717 8550259 11750274 14950327 181758974 21349808 24590855
2250365 5449979 8650302 11850523 15050228 18250272 21450685 24650228
2350467 5550145 8749797 119105677 15150108 18350390 215113514 24750316
24100861 5678827 88113581 120138358 15257005 18453395 216147492 24850546
2550158 5749803 8950264 12150209 15350095 18550334 21750396 24950078
2650483 5850279 9050176 12249674 15450292 18650529 21849915 25050111
2750211 5949997 9150337 12350396 15550453 18749836 21950170 25150524
2849859 6050507 9250313 12450280 15650059 18850229 220248053 25249981
2950580 6150869 9350417 12549877 15750355 18950313 22149937 25350533
3050142 6250358 9450292 12650240 15849821 19050204 22250139 25449975
3150265 6350165 9550352 127244313 15950410 19150529 223202938 255246009
The result is pretty staggering non-uniform and heavily biased to 0. I quickly eyeballed the figures and labeled the counter with colors. Red means a strong bias. Yellow is a slight bias. Green is the ballpark figure for unbiased result, but they are still above a typical count in this table. The coloring is quite subjective, but I'm only using it to highlight the non-uniform bias.

Now, this observation also indicates that RC4 is a very poor stream cipher. The way stream cipher works is essentially that a stream of pseudo-random bytes are generated and XOR-ed byte-wise with the plaintext in order to obtain ciphertext. The prevalence of 0 values means that most of the plaintext will be identical to the ciphertext because 0 is an XOR identity. The fact that this bad encryption is possible with some key choices should worry a lot of people who are still using RC4.

By a stroke of luck, I tried a slightly modified RC4 that actually gives me a pretty uniform distribution. The difference (from the pseudo-code listing of the RC4 wikipedia article) is highlighted in grey below.
i := 0
j := 0
while GeneratingOutput:
    i := (i + 1) mod 256
    j := (j + S[i]) mod 256
    swap values of S[i] and S[j]
    K := S[(S[i] + S[j]) mod 256]
    output K ^ j  // was just K before
endwhile
This modification gives me the following histogram.
bytecountbytecountbytecountbytecount bytecountbytecountbytecountbytecount
099723 32100388 6499739 9699866 128100026 160100026 19299831 22499581
199941 3399908 65100363 97100328 12999741 161100338 19399792 22599776
299968 34100010 6699589 98100231 130100649 162100084 194100218 22699839
399896 3599924 67100465 99100002 131100093 16399815 195100478 227100137
4100439 3699677 68100004 10099960 132100165 16499756 196100064 228100114
5100081 3799362 6999664 10199872 13399887 165100059 19799523 229100590
6100570 3899924 7099648 10299945 13499459 16699854 198100170 230100069
799812 3999706 71100066 10399598 13599532 16799499 19999956 23199892
899969 40100484 72100074 10499506 13699831 168100291 200100164 23299562
999840 4199756 7399128 105100178 13799646 169100348 201100121 23399412
1099793 42100133 74100019 106100488 13899882 17099661 20299864 23499878
11100208 43100059 7599717 107100141 139100374 171100116 20399643 235100394
12100157 4499772 7699846 108100085 140100180 17299632 20499928 236100123
13100098 45100232 77100017 109100654 14199955 17399993 20599858 237100188
14100017 46100135 7899566 11099104 14299640 17499256 206100243 23899918
15100259 47100403 7999459 11199887 143100255 17599818 20799612 23999840
16100139 4899888 80100828 11299999 14499820 176100378 20899663 24099622
17100093 49100173 8199633 113100480 14599804 17799826 209100564 24199916
18100257 5099987 82100502 114100253 146100554 17899998 210100183 242100319
1999835 51100206 8399776 11599674 147100726 17999909 211100548 24399803
20100010 5299879 84100188 11699657 148100255 18099687 21299937 244100122
21100424 53100257 85100197 117100375 14999866 18199741 213100669 24599972
22100062 5499376 86100123 118100396 15099623 182100523 214100195 246100034
2399729 55100404 87100620 119100275 15199594 18399879 21599921 247100092
24100076 5699792 88100383 12099712 152100641 18499514 216100226 248100600
25100107 57100214 89100079 12199682 15399723 18599764 217100120 24999574
2699785 58100284 9099835 122100090 15499805 18699752 21899850 250100211
2799839 5999459 91100332 123100185 15599557 187100176 219100167 251100213
2899700 6099795 9299966 124100599 15699709 188100422 220100028 252100368
2999776 6199663 9399596 12599479 157100266 189100427 221100172 25399842
30100319 6299734 9499962 12699874 158100068 190100238 222100058 25499921
3199537 63100025 95100043 127100382 15999910 19199824 223100697 25599923
This is quite amazingly uniform, at least appears good enough for my purpose. It remains to be seen whether two or more consecutive bytes of output will still retain the same uniformity.

I think it is quite possible that this slight modification could be immune to the existing statistical attacks on RC4.

Friday, December 2, 2011

Interface checking

One of my regrets for C++ is that it doesn't have interface checking when you instantiate templates. Coupled with the delayed instantiation (the code of a method is not fully type checked until you use it), static type checking of template code has been a hit or miss. Also, when you do get a type error, it is incomprehensible, riddled with the instantiation details that shouldn't be there in the first place. Interface checking means that the compiler can do better type checking before template instantiation. Interface checking has been in development for some time (e.g. Boost Concept Check, ConceptCpp, N2773), but it didn't make into the C++11 standard.

Interface checking in most programming languages follow a fulfillment model where an interface specifies a collection of function method signatures, and if a class implements all the functions (fulfills all signatures), then it is said to implement the interface. This is the case for Java and Go language as well as C++ concept checking in development so far.

However, if I were to develop interface for a new language, I would choose a declarative model where a class has to explicitly declare the interface it intends to implement. The interface is uniquely identifiable in the program's namespace. The motivation is to prevent a class from unintentionally implementing an interface by coincidence.

This is especially important for C++ since many operators are overloaded. For example, a random access iterator looks a lot like an integral type at the interface level (in fact it is a super concept because you can dereference an iterator), but manipulating the iterator like an integral without observing begin() and end() will cause the iterator to go out of bounds.

This semantic gap can be bridged by a more precise type specification using dependent type (not to be confused with C++ dependent type, a class that is complete if and only if a template argument refers to a complete type; complete type here simply means the compiler knows its memory layout so you can do a sizeof(type), for example, to get its size), but we don't yet have an object oriented dependent type system with interface checking. Even if we can bridge the semantic gap, I argue that a declarative interface checking is not less powerful than fulfillment interface checking.

Suppose you have a class Foo that implements all the methods required for interface Xyzzy, but Foo doesn't declare it so. In the fulfillment model, Foo can be used to instantiate any template (generic) that requires interface Xyzzy, but not in the declarative model, which requires Foo to explicitly state that it implements interface Xyzzy. Even if you cannot modify the definition of Foo, all you need to do is to subclass Foo and declare that it implements Xyzzy, and pass it to the template.

Another idea I have (in the context of C++ concept checking at least) is to get rid of the distinction between a class and an interface. Any class can be used as an interface. That's because C++ allows declaration and implementation to be separate. Furthermore, when a class is said to implement another class, it does not mean the class is a subclass of the other class. The compiler will only check that the method signatures are congruent.

Friday, November 25, 2011

Non-stop high speed rail

The basic premise is to find out how to let passenger board and unboard a moving train, without the train stopping. Here I'm tracking the ideas.
  • 2007/4/18, top boarding design by 陳建軍, titled "不停站可上下乘客的火車."
  • 2011/6/21, side boarding design by Paul Priestman, titled "Moving Platform."
  • 2011/11/24, cloverleaf (rear detach, front attach) design by a Slashdot commentator.
All of the above designs suffer poor aerodynamics.

Inspired by a Mythbuster study that a vehicle drafting behind a "big rig" improves mileage, I would suggest a rear boarding design where a passenger car will approach the speed of a moving high speed rail train from behind, dock for a few minutes while boarding and unboarding passengers, and detach from the train again from the rear.

I believe this docking mechanism will be safer and also energy efficient.

Thursday, October 27, 2011

Debugging static object destruction order

If you like to avoid the “static initialization order fiasco” by using construct on first use idiom but with a static object instead of a static object pointer, you might also run into static destruction order fiasco. One solution is to reorder the static constructors so that the destructors are called in the right order, by forcing the static object to be referenced before its first use.

In order to do decide how to reorder the destruction sequence, we need to tell how it will be carried out. The static destructors are called in reverse order the static constructors are called. Leveraging the fact that destructors are registered using __cxa_atexit (this is part of Itanium C++ ABI, which is adopted by Linux Standard Base and implemented in GCC) as soon as the constructor finishes, we can set breakpoint for __cxa_atexit to see the constructor order, which in turn means the destructors will be called in reverse.

The gdb recipe here will print the name of the function that contains the static object whenever the static object is constructed, but will continue execution.
(gdb) b __cxa_atexit  # assume this is breakpoint n
# set commands to automatically execute for breakpoint n
(gdb) commands n
>where 2
>continue
>end

(gdb) b __cxa_finalize
This allows you to generate a log of construction order without breaking and analyze it offline.

Friday, October 7, 2011

Xcode 4.1 for Lion and Apple App Store archive format

Today I had the (dis)pleasure of downloading Xcode 4.1 for Lion, which turns out to be packaged for App Store. I only have Leopard, so this is how I extracted the file content.

  • First I opened up xcode_4.1_for_lion.dmg, which is a regular hard disk image file.
  • After mounting, there is a InstallXcodeLion.pkg which turns out to be an xar file. To extract, run xar -x -f /Volumes/Install Xcode/InstallXcodeLion.pkg in an empty directory. I think this is the file that you would have downloaded over App Store.
  • After extraction, there are a few more files.
    • Distribution - an XML manifest for the App Store. Contains some JavaScript code that is used for the installer. Who'd have thought that Mac OS X installers use JavaScript for scripting?
    • InstallXcodeLion.pkg - contains yet another package for installation.
    • Resources - the license file and localization.
  • Inside InstallXcodeLion.pkg, there is a file Payload which is a cpio file. To extract, run cpio -i -F /path/to/InstallXcodeLion.pkg/Payload in another empty directory.
  • After extraction, you get an Applications folder with Install Xcode.app. This is supposedly what App Store saves to the Applications folder.
  • Attempting to run the mach-O binary in the command line results in: dyld: unknown required load command 0x80000022. It is LC_DYLD_INFO_ONLY.
  • All the sub-packages can be found in the Contents/Resources/Packages. They are xar packages, but with Bom, PackageInfo, Payload, and Scripts immediately inside. These .pkg files can actually be installed with the Installer that comes with Leopard. However, it is not clear if the binaries installed by these packages will suffer the LC_DYLD_INFO_ONLY issue.
This shows that with some script kiddie magic, you can install Xcode 4.1 on Leopard. Whether it will actually run or not is another thing.

Saturday, August 13, 2011

Remote Admin of Server in Closet

Suppose you have a server tucked away in a closet (or away in a data center). A common issue is, the server is wedged, and you need to power off and/or configure BIOS settings remotely, without being physically present at the server rack.

One sanctioned commercial solution is the Cyclades ACS serial console servers (8-port ~$2,000 and 32-port ~$3,000) with separate power strip relay like Startech rackmount server switch (8 outlet ~$440). What if you're just a hobbyist wanting to put some servers away in the closet or basement, but don't want to have to physically be present by the server when there is a problem?

Here is my proposed solution.

  • A power over ethernet switch, like Netgear ProSafe GS108T (~$100). The main reason to have power over ethernet is for interfacing with the serial-to-ethernet adapters.
  • A Netburner serial-to-ethernet adapter (~$30) with a power over ethernet module (~$8). The adapter has additional analog or digital ports that can be used to drive a relay.
  • A pair of solid state relay rated for 120VAC at least 10A input such as Opto 22 120D10 (~$22). This is a SPST (single pole single throw) relay, and will need a pair of them to control one NEMA 5 outlet which has two contacts. Also need to connect these contacts to NEMA 5 plugs (could use a cable like this one, ~$5). Consider using a blade crimp connector.
If you have 8 servers or more, you might as well go for the Startech rackmount server switch.

Saturday, July 30, 2011

KISSmetrics and life of an ETag

When I read Researchers Expose Cunning Online Tracking Service That Can’t Be Dodged on Slashdot, many commentators there thought disabling JavaScript could prevent tracking because the disclosure on how KISSmetrics works mentions serving two pieces of JavaScript file. However, JavaScript here is the red herring. The magic happens with ETag, a cache validation aspect of the web whose intended use is to speed up the loading of websites that you have already visited by downloading only content that has been updated since your last visit.

When a web browser first visits a site requesting an URL, the server responds with an entity tag (ETag) for that URL that can be used to uniquely identify the version of the URL served. Whenever the resource at a URL is modified, it is guaranteed that its ETag will change. Subsequent browser requests would include a conditional query, essentially telling the server "if this URL has not changed (same ETag) then don't bother sending me the resource." A web browser cache would remember the ETag for as long as the resource is in the cache.

NoScript will not block ETag. In fact, an ETag can be attached to any resource, an HTML file, an image, etc. The browser's incognito mode may not be sufficient if it shares the same browser cache with non-incognito mode content (as it will send the same ETag). The only way to disable ETag tracking is to disable/clear browser cache, but this too may not be sufficient (more about this later).

KISSmetrics uses a combination of techniques. To find out how, I hand-crafted an HTTP request as follows, and saved it in a text file in DOS line ending ("\n\r"). Notice the file needs to have a trailing blank line, which marks the end of the HTTP request header.
$ cat i.txt
GET /i.js HTTP/1.1
Host: i.kissmetrics.com

$ hexdump -C i.txt
00000000  47 45 54 20 2f 69 2e 6a  73 20 48 54 54 50 2f 31  |GET /i.js HTTP/1|
00000010  2e 31 0d 0a 48 6f 73 74  3a 20 69 2e 6b 69 73 73  |.1..Host: i.kiss|
00000020  6d 65 74 72 69 63 73 2e  63 6f 6d 0d 0a 0d 0a     |metrics.com....|
0000002f
Now make a request to be tracked.
$ cat i.txt | nc i.kissmetrics.com 80
HTTP/1.1 200 OK
Cache-Control: max-age=864000000, public
Date: Sat, 30 Jul 2011 19:49:32 GMT
ETag: "xy5cdaPdlMSI4u2xv8rndfudaAE"
Expires: Wed, 15 Dec 2038 19:49:32 GMT
Last-Modified: Sat, 30 Jul 2011 18:49:32 GMT
P3P: CP="NOI CURa ADMa DEVa TAIa OUR IND UNI INT"
Server: nginx
Set-Cookie: _km_cid=xy5cdaPdlMSI4u2xv8rndfudaAE;expires=Wed, 15 Dec 2038 19:49:32 GMT;path=/;
Content-Type: application/x-javascript
Content-Length: 79

var KMCID='xy5cdaPdlMSI4u2xv8rndfudaAE';if(typeof(_kmil) == 'function')_kmil();
Notice that the same identity is presented as ETag, a cookie, as well as a variable in the JavaScript. To my surprise, if I run the same command again, I get the same ETag.
$ cat i.txt| nc i.kissmetrics.com 80
HTTP/1.1 200 OK
Cache-Control: max-age=864000000, public
Date: Sat, 30 Jul 2011 19:49:32 GMT
ETag: "xy5cdaPdlMSI4u2xv8rndfudaAE"
Expires: Wed, 15 Dec 2038 19:49:32 GMT
Last-Modified: Sat, 30 Jul 2011 18:49:32 GMT
P3P: CP="NOI CURa ADMa DEVa TAIa OUR IND UNI INT"
Server: nginx
Age: 298
Content-Type: application/x-javascript
Content-Length: 79

var KMCID='xy5cdaPdlMSI4u2xv8rndfudaAE';if(typeof(_kmil) == 'function')_kmil();
This time, notice that it no longer tries to set a cookie, but it somehow remembers my ETag and sets an age. Running the same command again, I get:
$ cat i.txt| nc i.kissmetrics.com 80
HTTP/1.1 200 OK
Cache-Control: max-age=864000000, public
Date: Sat, 30 Jul 2011 19:49:32 GMT
ETag: "xy5cdaPdlMSI4u2xv8rndfudaAE"
Expires: Wed, 15 Dec 2038 19:49:32 GMT
Last-Modified: Sat, 30 Jul 2011 18:49:32 GMT
P3P: CP="NOI CURa ADMa DEVa TAIa OUR IND UNI INT"
Server: nginx
Age: 542
Content-Type: application/x-javascript
Content-Length: 79

var KMCID='xy5cdaPdlMSI4u2xv8rndfudaAE';if(typeof(_kmil) == 'function')_kmil();
Notice the longer age now.

Now, why does this result surprise me? If I hand-craft an HTTP request, the request should be perfectly stateless. I am expecting to get a different ETag every time I try the same command. But I am getting the same one every time, as if I'm still being tracked.

It turns out there is a co-conspirator. I'm using a mobile wireless connection right now, and there is a transparent proxy between my computer and KISSmetrics. The transparent proxy is part of the network infrastructure to lessen the load of my provider's connection to another network, by sharing a cache among my provider's users. An evidence of the existence of this transparent proxy is the difference in server behavior. If I switch to a network without the transparent proxy, I get this:
$ cat i.txt | nc i.kissmetrics.com 80
HTTP/1.1 503 Service Unavailable.
Content-length:0

The web server, nginx, wants the connection to remain open until at least it starts sending the response, but the transparent proxy before did not require this. This is probably a side-effect of nginx HTTP pipelining support. It's not too difficult to workaround this problem, by slightly modifying the command.
$ cat i.txt /dev/tty | nc i.kissmetrics.com 80
HTTP/1.1 200 OK
Cache-Control: max-age=864000000, public
Content-Type: application/x-javascript
Date: Sat, 30 Jul 2011 21:55:58 GMT
ETag: "ysdfEF8mCndrvOxrcnzF4tysDss"
Expires: Wed, 15 Dec 2038 21:55:58 GMT
Last-Modified: Sat, 30 Jul 2011 20:55:58 GMT
P3P: CP="NOI CURa ADMa DEVa TAIa OUR IND UNI INT"
Server: nginx
Set-Cookie: _km_cid=ysdfEF8mCndrvOxrcnzF4tysDss;expires=Wed, 15 Dec 2038 21:55:58 GMT;path=/;
Content-Length: 79
Connection: keep-alive

var KMCID='ysdfEF8mCndrvOxrcnzF4tysDss';if(typeof(_kmil) == 'function')_kmil();
After the server responds, I hit Ctrl-D to end the connection. I now get a fresh tag (as well as a cookie) every time.
$ cat i.txt /dev/tty | nc i.kissmetrics.com 80
HTTP/1.1 200 OK
...
ETag: "ikLBYzrQaWhFzc5lsacDhni3ftI"
...
Set-Cookie: _km_cid=ikLBYzrQaWhFzc5lsacDhni3ftI;expires=Wed, 15 Dec 2038 22:03:26 GMT;path=/;
Content-Length: 79
Connection: keep-alive

var KMCID='ikLBYzrQaWhFzc5lsacDhni3ftI';if(typeof(_kmil) == 'function')_kmil();
$ cat i.txt /dev/tty | nc i.kissmetrics.com 80
HTTP/1.1 200 OK
...
ETag: "fsxiUZH0lIdITI0YA4-uxXslRMQ"
...
Set-Cookie: _km_cid=fsxiUZH0lIdITI0YA4-uxXslRMQ;expires=Wed, 15 Dec 2038 22:03:33 GMT;path=/;
Content-Length: 79
Connection: keep-alive

var KMCID='fsxiUZH0lIdITI0YA4-uxXslRMQ';if(typeof(_kmil) == 'function')_kmil();
I abbreviated the irrelevant headers.

Further, by modifying the HTTP request, I could get KISSmetrics to replay the ETag. A cookie is added to the HTTP request:
$ cat j.txt
GET /i.js HTTP/1.1
Host: i.kissmetrics.com
Cookie: _km_cid=fsxiUZH0lIdITI0YA4-uxXslRMQ

$ cat j.txt /dev/tty | nc i.kissmetrics.com 80
HTTP/1.1 200 OK
Cache-Control: max-age=864000000, public
Content-Type: application/x-javascript
Date: Sat, 30 Jul 2011 22:08:35 GMT
ETag: "fsxiUZH0lIdITI0YA4-uxXslRMQ"
Expires: Wed, 15 Dec 2038 22:08:35 GMT
Last-Modified: Sat, 30 Jul 2011 21:08:35 GMT
P3P: CP="NOI CURa ADMa DEVa TAIa OUR IND UNI INT"
Server: nginx
Set-Cookie: _km_cid=fsxiUZH0lIdITI0YA4-uxXslRMQ;expires=Wed, 15 Dec 2038 22:08:35 GMT;path=/;
Content-Length: 79
Connection: keep-alive

var KMCID='fsxiUZH0lIdITI0YA4-uxXslRMQ';if(typeof(_kmil) == 'function')_kmil();
What is interesting is that if I perform the If-None-Match query, KISSmetrics doesn't try to set the cookie back. I thought it should.
$ cat k.txt
GET /i.js HTTP/1.1
Host: i.kissmetrics.com
If-None-Match: "fsxiUZH0lIdITI0YA4-uxXslRMQ"

$ cat k.txt /dev/tty | nc i.kissmetrics.com 80
HTTP/1.1 304 Not Modified
Date: Sat, 30 Jul 2011 22:11:20 GMT
Server: nginx
Connection: keep-alive

This exercise reveals why ETag is such a clever technique to track visitors. By leveraging the transparent proxy cache, the end user has no option opting out of tracking. In fact, the web browser cache is simply a leaf node of a greater Internet content distribution cache framework. By using ETag, your internet service provider will do the dirty work for KISSmetrics. You can still be tracked through no fault of your web browser. As to who is responsibility for tracking you, the distinction is blurred.

To illustrate how the transparent proxy aids tracking, if I connect back to the network with the transparent proxy using cookie replay, the transparent proxy now starts tracking the replayed identity.
$ cat j.txt | nc i.kissmetrics.com 80  # Cookie-replayed request.
HTTP/1.1 200 OK
...
ETag: "fsxiUZH0lIdITI0YA4-uxXslRMQ"
...
$ cat i.txt | nc i.kissmetrics.com 80  # Untracked request.
HTTP/1.1 200 OK
...
ETag: "fsxiUZH0lIdITI0YA4-uxXslRMQ"
...
I disconnect and reconnect again. Now I issue the untracked request first, followed by a cookie replay, and by an ETag replay. Notice how the replay is now ignored because the new untracked request is now cached by the transparent proxy.
$ cat i.txt | nc i.kissmetrics.com 80  # Untracked request.
HTTP/1.1 200 OK
...
ETag: "Chw55f8kmJAUzkH15o0uP8Qz6i0"
...
$ cat j.txt | nc i.kissmetrics.com 80  # Cookie-replayed request.
HTTP/1.1 200 OK
...
ETag: "Chw55f8kmJAUzkH15o0uP8Qz6i0"
...
$ cat k.txt | nc i.kissmetrics.com 80  # ETag-replayed request.
HTTP/1.1 200 OK
...
ETag: "Chw55f8kmJAUzkH15o0uP8Qz6i0"
...
If you want to surf the web without being tracked, you (1) disconnect from the network, (2) reconnect, and (3) prime the transparent proxy's cache with a new identity request; then without clearing browser cache or cookies, you will be issued a new identity. However, it is possible that when the browser presents an old identity alongside the new identity, KISSmetrics can correlate and merge the two identities. It is probably safer to clear the browser cache and cookies just to be sure.

I think this at least marks the beginning of a happy story. Even though the transparent proxy cache built into the network infrastructure by the internet provider facilitates tracking, it is still possible for an end-user to evade the tracking by manipulating the proxy in a certain way.

Finally, the identity here is not really personally identifiable information per-se. To KISSmetrics, it is just a random string that tells them the random string has been seen visiting websites X, Y and Z. Unless you provide personally identifiable information to websites X, Y, or Z, all they know is that the same person has used different internet providers to visit certain websites.

Tuesday, July 5, 2011

ISMM 2011 selected paper notes

  • Cache Index-Aware Memory Allocation. Allocator of fixed-size headerless objects with sizes that are multiples of cache line size (e.g. object size 128 on cache lines of 64 bytes) may cause certain cache indices to be evicted often when objects uniformly map to the same cache line. Adding spacers (punctuated array) to these objects could break the affinity. NOTE(liulk):
    • Cache index conflict should generally happen only at block level if the allocator uses superblocks to manage arenas.
    • Knuth boundary tags are natural spacers used for variable sized object allocation, and would not suffer this problem.
  • Compartmental Memory Management in a Modern Web Browser. Memory resources allocated for each origin (as in "same origin policy") is stored together in the same heap, and a web browser will have separate heaps for each origin. Although Chrome spawns a process for each tag, but this may not be practical for mobile devices where hardware address space isolation could be expensive. This also shortens garbage collection time because same-origin private heaps do not have reference to objects belonging to another origin.
  • Multicore Garbage Collection with Local Heaps. Illustrates how mutation handling makes it hard to separate local and global heaps, and how despite the complicated mechanism, achieves little gain in scalability.
  • Iterative Data-parallel Mark&Sweep on a GPU. Highlights challenges specific to GPU, namely the lack of deep recursion due to the relatively small size of video memory (~128MB) with no virtual memory and the large number of threads (1000+).

Saturday, April 2, 2011

Factories and rvalue reference

One of the benefits of rvalue references is that it provides perfect forwarding for libraries. One example is the factory. A factory here is taken in a loose sense to mean any function (as well as a constructor) that has to construct another object using a non-default constructor, taking the initializer from the function's argument. An example factory is shown below.
template<class T>
T *factory(const T& x = T()) {
  return new T(x);
}
Also, the constructor of std::pair is a factory.

A typical use of the factory is to pass in temporary object as the initializer. For example,
std::complex *p = factory(std::complex(1.0, -1.0));
One issue that factories face when the initializer retains resource is what to do with the resource. The resource could be a memory location, a file in the filesystem, or anything whose explicit ownership needs to be established or resource sharing and leak would be an unwanted consequence.
  • Sharing the resource is out of the question, since this would violate ownership invariant.
  • Stealing the resource may be desirable but not possible, since x is of type const T& which means you cannot modify x.
  • Copying the resource could be expensive, for example if x contains a string or a vector with a lot of data.
Part of this issue could be mitigated by the copy constructor of T itself. It could be the case that T only wants to copy some defining characteristics from the initializer and not the data. For example, if T represents a file, then only the file permissions are copied but not file content; or if T is a string buffer, then only the open mode is copied but not the content of the buffer.

Note that std::stringstream does not have a copy constructor, so we're only talking about a hypothetical class. In the case of std::stringstream, most STL implementation probably did not prohibit copy constructor, so the compiler automatically generates one that could end up performing shallow copy of the underlying buffer. This happens to work for temporary initializer like this:
std::stringstream *p = factory(std::stringstream(...));
But if the initializer is an lvalue, then sharing of the underlying buffer would be undesirable.
std::stringstream ss(...);
std::stringstream *p = factory(ss);
Depending on what stringstream's copy constructor does, *p and ss might end up sharing the same buffer, and mutating ss or *p would make the internal states of the other object inconsistent.

With the introduction of rvalue references, we can now give a strong hint to the factory that the internal resources should be transferred.
template<class T>  // factory using copy semantics
T *factory(const T& x) {
  return new T(x);
}

template<class T>  // factory using move semantics
T *factory(T&& x) {
  return new T(std::forward<t>(x));
}
Note that without the std::forward<T>, a named rvalue reference is treated as an lvalue reference when used [N1377], and we would end up calling the non-const lvalue reference constructor (i.e. T(T& x)) or fall back to copy constructor (i.e. the const lvalue reference constructor).

Revisiting the factory using scenario above,
std::stringstream *p = factory(std::stringstream(...));
would use the move semantics rvalue reference factory, and
std::stringstream ss(...);
std::stringstream *p = factory(ss);
would use the copy semantics const T& factory.

This is consistent with N1377 move proposal which says that the most common anticipated overload to be,
void foo(const A& t);  // #1
void foo(A&& t);       // #2
Note that the factory examples in rvalue reference proposal [N1690] do not have a const T& overload. This means that the factory will only steal resources because of the move semantics. And that factory(ss) above will steal the underlying buffer of ss and render it unusable.

The conclusion is that factory must overload both const T& and T&&.

This actually works nicely with existing libraries. In pre-C++0x code, the factory will always copy. A preprocessor macro would detect the availability of C++0x features and add the rvalue reference version to support move semantics in factory.

One last word about regular functions with rvalue reference arguments. In N1377, regarding binding temporaries to references, it is mentioned that a value could be implicitly converted to a temporary, and this is why passing a temporary of T to T& is disallowed.
void incr(int& rr) {rr++;}
 
void g()
{
    double ss = 1;
    incr(ss);
}
Here, incr(ss) implicitly converts ss to an int, and increment is done on the temporary int value instead. This is properly disallowed in C++. However, in the case of rvalue reference, it is a good idea to make the constructor explicit to disallow implicit conversion.
class Foo {
 public:
  explicit Foo(Bar b);  // regular constructor
};
Otherwise, a function expecting Foo&& for argument but given a Bar object would implicitly convert Bar to Foo and operate on the temporary Foo. Furthermore, if Foo also has a rvalue reference constructor taking Bar&&, then the temporary Foo would steal all internal resources of Bar, get passed to the function, and self-destruct. In other words, implicit conversion from Bar to Foo would destroy Bar.

This is likely going to be a very common mistake when more people misuse rvalue reference. On the other hand, it would be wise to limit the use of rvalue reference and never use it like lvalue reference even if you declare the constructors to be explicit. In particular, rvalue reference may or may not modify the argument, so it should not be used as an "out" argument.

Saturday, March 19, 2011

Wikileaks? Openleaks? Try Patriotleaks!

(This is a fictional article, but I welcome those who turn this fiction into reality.)

Imagine on patriotleaks.us website, it features a blatant American flag in the background, an eagle seal, and a heading that reads, "Patriot Leaks, dedicated to protect United States national interest by revealing dirty leaks about foreign government and institutions." This site will probably not be endorsed by the US government, but it will operate under the legal confines of US law.

Then imagine on aiguoleaks.cn website, it features a red and yellow Chinese flag in the background, the communist seal of hammer and sickle, and a heading that reads, "用爱国主义来揭发外国政府和外国组织丑恶的罪行来捍卫祖国。" (Translation: using patriotism to reveal dirty crimes of foreign governments and foreign organizations in order to protect motherland.)

Then imagine there is the Arabic counterpart, the Russian counterpart, the European counterpart, the British commonwealth counterpart. All of these leaks sites all operate under the legal confines of their own legal system, and would never leak things about their own authority in power.

But they all leak documents about each other like Wikileaks does. Furthermore, they will all mirror each other's leaks except the ones about their respective hosting country. This way, we have redundantly distributed partial mirror of all leaks in the world, but all of them legal in their own jurisdiction.

You can imagine Wikileaks in a way has become the Nordic node of the Patriotleaks network. They have support from Sweden, Belgium, and Iceland. They would not leak documents that hurt the national interest of these countries (and they have not done so as far as I can tell).

This type of leaks network will be difficult to suppress by one government alone.

Saturday, February 26, 2011

Redundancy of audio packets over lossy network

I wanted to know what people currently do to stream audio packets over a network with packet loss. There are several options for streaming audio, and they vary in their effectiveness against packet loss.

  • AirTunes (aka. RAOP) is the protocol used between iTunes and Airport Express, based on RTSP (for establishing session) and RTP (for transmitting data). My impression is that you could stream MPEG4 AAC and Apple Lossless Audio Codec to Airport Express, though it may be possible to stream uncompressed audio too. Note that audio codec itself introduces algorithmetic delay, since compression is performed by grouping audio data over a time window. It is not clear how the protocol deals with packet loss, since RTP does not define how to deal with it.
  • Netjack connects a jack server to a slave. It can stream CELT or raw audio over UDP. CELT is a low-delay audio codec with packet loss concealment. However, a tradeoff of the low-delay is that it could not resolve low-frequency pitch (~100Hz fundamental), and has to rely on a long term predictor. I'm not sure how this affects audio quality in practice. The packet transport does not seem to attempt recovering from packet loss, but instead relies on CELT codec for packet loss concealment. The slaves audio time clock is synchronized to the jack server.
  • Netjack2 is really a distributed version of jack, using remote computing nodes to apply computationally intensive audio filtering, instead of playing back sound. They implemented a discovery protocol based on multicast, and audio is transmitted over UDP, with no packet loss handling. When using an audio adapter for streaming audio to a slave sound card device, audio resampling is applied to compensate for clock drift.
  • Jacktrip, developed at Stanford CCRMA, connects two jack servers over a network. There doesn't seem to be any time synchronization. Data is transmitted over UDP, and it is possible to transmit redundant data in a rolling window in order to recover from data loss.
  • There is a proposal (quick view) in using Reed-Solomon code to transmit data over UDP at Kyushu University.
I'm particularly interested in technique that could deal with wireless network packet loss, which has the following characteristics:
  • Wireless packet radio implements low-level retransmission of lost or corrupt packets. On the higher level, this is perceived as high packet delay.
  • When packet loss on the higher level does occur, it usually happens in bursts of packets. I estimate that up to 3ms radio gap is possible in normal condition.
Regarding these characteristics, the rolling window redundancy used by Jacktrip would not be effective, since a whole streak of packet loss would not be recoverable under this scheme.

When using Reed-Solomon, if we use byte-sized symbols, then a code block can only be 255 bytes (including parity). If we naively transmit the 255 bytes over UDP as a single packet, it is possible that the whole block would be dropped altogether. But we could transpose the block so that, when transmitting n blocks over m UDP packets, each UDP packet would contain 1/m slice of n blocks. This seems to be the approach taken by the Kyushu University proposal. This scheme, known as Cross-Interleaved Reed-Solomon, is also used on audio CD.

Tuesday, February 15, 2011

The real way to disable Adobe Updater from your Mac OS X

Adobe update manager is really annoying, but most instructions on the web to disable it merely tells Adobe Updater not to report updates; the updater still runs silently. The fact that I'm dedicating system resource every now and then so the Adobe Updater can phone home but not tell me to update is not good enough for me. I want Adobe Updater to stop completely.

To stop Adobe Updater completely, one must understand how it gets run in the first place. The updater is launched by a Mac OS X system service called launchd. To launchd, Adobe Updater is a periodic job. The job file is stored under your ~/Library/LaunchAgents folder. The actual file name is suffixed with a number of random characters, but it starts with "com.adobe.ARM" as the prefix. If you look inside the file (it's a plain text file), you'd see that launchd would run the updater at 12600 seconds interval, or 3.5 hours.
To remove, type these commands in a Terminal window:
cd ~/Library/LaunchAgents
launchctl remove `basename com.adobe.ARM.* .plist`
rm com.adobe.ARM.*
Basically, the idea is, for each launchd plist file in ~/Library/LaunchAgents that you don't want, run launchctl remove on the job name, which is the same as the plist file name without the .plist suffix, then remove the actual .plist file.

While you are at it, there may be other launchd jobs in ~/Library/LaunchAgents left over from stale applications you might have tried before. Feel free to remove them all.

You're welcome.

Edit (Oct 20, 2012): a couple of readers pointed out in the comment that the launchd namespace used by Adobe Updater is now different. I just installed Adobe (Acrobat) Reader XI and found that the name is still com.adobe.ARM.*, but if you have Creative Suite, it might be com.adobe.AAM.* instead. I don't have Creative Suite so I can't verify that.

Furthermore, it appears that when you set Updater preference in Adobe Reader XI to "Do not download or install updates automatically," it now removes the launchd task as well, which means the launchctl and rm commands would no longer be necessary. Kudos to Adobe for figuring that out!

One reader also pointed out that in his case, the updater is installed in the system-wide location /Library/LaunchAgents. In that case, you will need to run “sudo su -” first and type in your own password to gain root privilege (the prompt changes from “$” to “#”) before they can be removed. Be careful the commands you enter as root, as a mistake can irreparably damage your system.

Thanks for keeping me updated y'all.

Friday, February 11, 2011

How many aligned address space can I allocate via mmap?

Suppose I want to allocate a naturally aligned piece of memory through mmap(), say I want to allocate a 16KB block of memory aligned to 16KB, what is the greatest amount of memory inside the address space I could allocate in that way? What if I want to allocate 4MB aligned to 4MB? Or 16MB aligned to 16MB? The table is the result in number of that sized chunks and the total amount in GB.

OS4KB16KB4MB16MB64MB
Mac OS X 10.5.8900227 (3.43GB)225056 (3.43GB)877 (3.43GB)217 (3.39GB)52 (3.25GB)
Linux 2.6.18 (32-bit)1048173 (4.00GB)262040 (4.00GB)1020 (3.98GB)252 (3.94GB)59 (3.69GB)
Linux 2.6.18 (64-bit)6168666 (23.53GB)1540236 (23.50GB)6016 (23.5GB)1504 (23.5GB)376 (23.5GB)

The machine that ran the 64-bit test is an AMD Opteron 8220 SE with 40 bit physical address lines (1024 GB) and 48 bit virtual address lines (262144 GB). However, I can't seem to actually allocate that much memory at all.

The source of the program can be found here (alignedmmap.c). Note that since the address is never touched, this does not allocate any actual memory, and the program is so tiny that it hasn't loaded very many shared objects besides standard C library.

Friday, February 4, 2011

Putting IPv6 to the test

It's been a while since I configured my Airport Express with an IPv6 tunnel to Hurricane Electric's tunnelbroker.net. While this allows me to be IPv6 enabled, I've been wondering what my Internet experience would be like in a world where I can't get IPv4 addresses anymore, now that IPv4 address exhaustion will happen any minute now. Since today I'm sick at home, it's a perfect time to play in between sleep and orange juice.

I turned off IPv4 on my computer. The first surprise is that DNS didn't work. The reason is because, even though I configured the Airport to use Hurricane Electric's IPv6 DNS server 2001:470:20::2, the IPv6 stateless auto-configuration used by Airport Express does not propagate that information, and Airport Express does not in fact provide DHCPv6.

After configuring the DNS manually, it's time to hit Google. All of Google's services work fine as far as I can tell. I can watch YouTube videos. My blog is working, since it's hosted by Google. The bad news is that I could no longer reach most of the search result, which is not a surprise. The search experience is disappointing because I don't know which sites are IPv4 only. For example, when searching for IPv6 stateless auto-configuration, the RFC mirror faqs.org does not work, but ietf.org works. The good news is that the "cache" link allows me to look at the page despite the site being unreachable over IPv6.

Most of the big websites aren't reachable over IPv6: facebook.com, yahoo.com, live.com / bing.com / microsoft.com, wikipedia.org, amazon.com, linkedin.com, ... cnn.com, newegg.com, apple.com, etc. Also, w3.org isn't IPv6 ready, neither is mit.edu (it's not my alma mater, but you'd think they're at the forefront of technology).

To be fair, what is probably going to happen when an ISP runs out of public IPv4 addresses for its subscribers is that they will run a large scale NAT router, and assign private IPv4 addresses instead. You will be getting an address like 10.xx.yy.zz, which is not reachable from the Internet, but you will still be able to reach out thanks to NAT, which stands for network address translation. This situation is already like most home Internet router where your broadband only gives you one public IP address that has to be shared among several devices at home. You home computers already get private IP addresses like 192.168.aa.bb. The problem with large-scale NAT router is that it's probably going to be slower and less reliable—since it has to know how to translate thousands of connections concurrently, and it is a single point of failure—giving you crappy Internet experience.

Another solution is that, when an ISP runs out of IPv4 addresses and starts putting subscribers on IPv6 only, it could simultaneously provide an HTTP proxy server reachable from its own IPv6 network that will allow IPv4 websites to be visited through this proxy. At least the HTTP proxy can be load balanced, so it would not be a single point of failure.

Since the burden of IPv4 interoperability seems to lie on the shoulder of ISPs, I can see why the big websites have been slow to adopt IPv6. But the truth is, you can't count on an end-user's ISP to bring IPv6 users to your IPv4 website reliably, and the IPv6 native sites are going to give a better end-user experience for native IPv6 users.

My own experiment tells me that I'm not ready to retire IPv4 yet (I'm not even using a native IPv6 stack), though I could if I'm only interested in writing blogs and watch YouTube videos.