## Sunday, December 4, 2011

### Evaluating RC4 as pseudo-random number generator

Update (September 3, 2012): I made some mistakes in my own implementation of RC4. An analysis of the mistake is documented in Mistake in my RC4 implementation. After fixing the mistake, I do not see the skewed distribution shown here. Please be advised that none of the blog posts in this blog are peer reviewed, and mistakes happen.
On the quest to look for a pseudo-random number generator that is easy to implement, I gave up on Mersenne Twister but took Wikipedia's advice that a stream cipher can be used for generating pseudo-random sequences. But it also mentions that RC4 is not a particularly uniform stream cipher which makes statistical attacks possible. Because of the simplicity of RC4, I decided to implement it and see really how bad it is. Since RC4 outputs a stream of bytes, I took the histogram of the byte value. The initialization key is the integer value of a stack-allocated object pointer, repeated for 64 bytes, and it's run on a 64-bit machine. I'm not sure if the fact that the key is highly regular has anything to do with the following non-uniform result that I observed.
 byte count byte count byte count byte count byte count byte count byte count byte count 0 5499938 32 50376 64 81740 96 253697 128 50004 160 144433 192 147856 224 147031 1 134854 33 50103 65 50386 97 115718 129 50024 161 49930 193 149593 225 50260 2 50352 34 50179 66 954060 98 50362 130 50185 162 50553 194 134910 226 78249 3 115332 35 50006 67 50091 99 50170 131 50543 163 50289 195 50516 227 49853 4 50214 36 50065 68 50256 100 50052 132 50288 164 49950 196 50532 228 50303 5 50089 37 50589 69 50173 101 50118 133 143783 165 49753 197 50182 229 50135 6 207241 38 49946 70 50416 102 50395 134 50447 166 50270 198 50145 230 134634 7 68820 39 50147 71 49985 103 49774 135 50252 167 50502 199 49368 231 49987 8 50212 40 103906 72 49950 104 50501 136 49803 168 50038 200 50032 232 50486 9 695154 41 49905 73 50137 105 50389 137 50085 169 50619 201 66006 233 49849 10 50439 42 49865 74 50472 106 50284 138 49508 170 50045 202 50336 234 50083 11 49682 43 948555 75 50141 107 50230 139 50294 171 50719 203 50300 235 50267 12 50378 44 49965 76 140698 108 50119 140 49772 172 50399 204 49963 236 50117 13 50466 45 50171 77 49814 109 50012 141 50385 173 50031 205 50072 237 49945 14 49591 46 50290 78 50219 110 90366 142 50177 174 50126 206 50585 238 50157 15 49776 47 50340 79 49916 111 49974 143 50224 175 50513 207 50350 239 50234 16 287052 48 205931 80 151232 112 131863 144 50079 176 50026 208 50226 240 131317 17 50022 49 50076 81 50134 113 49801 145 50487 177 50688 209 81920 241 50161 18 118991 50 50125 82 50108 114 50370 146 66545 178 78182 210 49998 242 50092 19 50209 51 50706 83 50292 115 50297 147 50236 179 50411 211 50458 243 50180 20 50170 52 50025 84 221044 116 116139 148 247793 180 215784 212 50185 244 50124 21 50371 53 49717 85 50259 117 50274 149 50327 181 758974 213 49808 245 90855 22 50365 54 49979 86 50302 118 50523 150 50228 182 50272 214 50685 246 50228 23 50467 55 50145 87 49797 119 105677 151 50108 183 50390 215 113514 247 50316 24 100861 56 78827 88 113581 120 138358 152 57005 184 53395 216 147492 248 50546 25 50158 57 49803 89 50264 121 50209 153 50095 185 50334 217 50396 249 50078 26 50483 58 50279 90 50176 122 49674 154 50292 186 50529 218 49915 250 50111 27 50211 59 49997 91 50337 123 50396 155 50453 187 49836 219 50170 251 50524 28 49859 60 50507 92 50313 124 50280 156 50059 188 50229 220 248053 252 49981 29 50580 61 50869 93 50417 125 49877 157 50355 189 50313 221 49937 253 50533 30 50142 62 50358 94 50292 126 50240 158 49821 190 50204 222 50139 254 49975 31 50265 63 50165 95 50352 127 244313 159 50410 191 50529 223 202938 255 246009
The result is pretty staggering non-uniform and heavily biased to 0. I quickly eyeballed the figures and labeled the counter with colors. Red means a strong bias. Yellow is a slight bias. Green is the ballpark figure for unbiased result, but they are still above a typical count in this table. The coloring is quite subjective, but I'm only using it to highlight the non-uniform bias.

Now, this observation also indicates that RC4 is a very poor stream cipher. The way stream cipher works is essentially that a stream of pseudo-random bytes are generated and XOR-ed byte-wise with the plaintext in order to obtain ciphertext. The prevalence of 0 values means that most of the plaintext will be identical to the ciphertext because 0 is an XOR identity. The fact that this bad encryption is possible with some key choices should worry a lot of people who are still using RC4.

By a stroke of luck, I tried a slightly modified RC4 that actually gives me a pretty uniform distribution. The difference (from the pseudo-code listing of the RC4 wikipedia article) is highlighted in grey below.
i := 0
j := 0
while GeneratingOutput:
i := (i + 1) mod 256
j := (j + S[i]) mod 256
swap values of S[i] and S[j]
K := S[(S[i] + S[j]) mod 256]
output K ^ j  // was just K before
endwhile

This modification gives me the following histogram.
 byte count byte count byte count byte count byte count byte count byte count byte count 0 99723 32 100388 64 99739 96 99866 128 100026 160 100026 192 99831 224 99581 1 99941 33 99908 65 100363 97 100328 129 99741 161 100338 193 99792 225 99776 2 99968 34 100010 66 99589 98 100231 130 100649 162 100084 194 100218 226 99839 3 99896 35 99924 67 100465 99 100002 131 100093 163 99815 195 100478 227 100137 4 100439 36 99677 68 100004 100 99960 132 100165 164 99756 196 100064 228 100114 5 100081 37 99362 69 99664 101 99872 133 99887 165 100059 197 99523 229 100590 6 100570 38 99924 70 99648 102 99945 134 99459 166 99854 198 100170 230 100069 7 99812 39 99706 71 100066 103 99598 135 99532 167 99499 199 99956 231 99892 8 99969 40 100484 72 100074 104 99506 136 99831 168 100291 200 100164 232 99562 9 99840 41 99756 73 99128 105 100178 137 99646 169 100348 201 100121 233 99412 10 99793 42 100133 74 100019 106 100488 138 99882 170 99661 202 99864 234 99878 11 100208 43 100059 75 99717 107 100141 139 100374 171 100116 203 99643 235 100394 12 100157 44 99772 76 99846 108 100085 140 100180 172 99632 204 99928 236 100123 13 100098 45 100232 77 100017 109 100654 141 99955 173 99993 205 99858 237 100188 14 100017 46 100135 78 99566 110 99104 142 99640 174 99256 206 100243 238 99918 15 100259 47 100403 79 99459 111 99887 143 100255 175 99818 207 99612 239 99840 16 100139 48 99888 80 100828 112 99999 144 99820 176 100378 208 99663 240 99622 17 100093 49 100173 81 99633 113 100480 145 99804 177 99826 209 100564 241 99916 18 100257 50 99987 82 100502 114 100253 146 100554 178 99998 210 100183 242 100319 19 99835 51 100206 83 99776 115 99674 147 100726 179 99909 211 100548 243 99803 20 100010 52 99879 84 100188 116 99657 148 100255 180 99687 212 99937 244 100122 21 100424 53 100257 85 100197 117 100375 149 99866 181 99741 213 100669 245 99972 22 100062 54 99376 86 100123 118 100396 150 99623 182 100523 214 100195 246 100034 23 99729 55 100404 87 100620 119 100275 151 99594 183 99879 215 99921 247 100092 24 100076 56 99792 88 100383 120 99712 152 100641 184 99514 216 100226 248 100600 25 100107 57 100214 89 100079 121 99682 153 99723 185 99764 217 100120 249 99574 26 99785 58 100284 90 99835 122 100090 154 99805 186 99752 218 99850 250 100211 27 99839 59 99459 91 100332 123 100185 155 99557 187 100176 219 100167 251 100213 28 99700 60 99795 92 99966 124 100599 156 99709 188 100422 220 100028 252 100368 29 99776 61 99663 93 99596 125 99479 157 100266 189 100427 221 100172 253 99842 30 100319 62 99734 94 99962 126 99874 158 100068 190 100238 222 100058 254 99921 31 99537 63 100025 95 100043 127 100382 159 99910 191 99824 223 100697 255 99923
This is quite amazingly uniform, at least appears good enough for my purpose. It remains to be seen whether two or more consecutive bytes of output will still retain the same uniformity.

I think it is quite possible that this slight modification could be immune to the existing statistical attacks on RC4.

## Friday, December 2, 2011

### Interface checking

One of my regrets for C++ is that it doesn't have interface checking when you instantiate templates. Coupled with the delayed instantiation (the code of a method is not fully type checked until you use it), static type checking of template code has been a hit or miss. Also, when you do get a type error, it is incomprehensible, riddled with the instantiation details that shouldn't be there in the first place. Interface checking means that the compiler can do better type checking before template instantiation. Interface checking has been in development for some time (e.g. Boost Concept Check, ConceptCpp, N2773), but it didn't make into the C++11 standard.

Interface checking in most programming languages follow a fulfillment model where an interface specifies a collection of function method signatures, and if a class implements all the functions (fulfills all signatures), then it is said to implement the interface. This is the case for Java and Go language as well as C++ concept checking in development so far.

However, if I were to develop interface for a new language, I would choose a declarative model where a class has to explicitly declare the interface it intends to implement. The interface is uniquely identifiable in the program's namespace. The motivation is to prevent a class from unintentionally implementing an interface by coincidence.

This is especially important for C++ since many operators are overloaded. For example, a random access iterator looks a lot like an integral type at the interface level (in fact it is a super concept because you can dereference an iterator), but manipulating the iterator like an integral without observing begin() and end() will cause the iterator to go out of bounds.

This semantic gap can be bridged by a more precise type specification using dependent type (not to be confused with C++ dependent type, a class that is complete if and only if a template argument refers to a complete type; complete type here simply means the compiler knows its memory layout so you can do a sizeof(type), for example, to get its size), but we don't yet have an object oriented dependent type system with interface checking. Even if we can bridge the semantic gap, I argue that a declarative interface checking is not less powerful than fulfillment interface checking.

Suppose you have a class Foo that implements all the methods required for interface Xyzzy, but Foo doesn't declare it so. In the fulfillment model, Foo can be used to instantiate any template (generic) that requires interface Xyzzy, but not in the declarative model, which requires Foo to explicitly state that it implements interface Xyzzy. Even if you cannot modify the definition of Foo, all you need to do is to subclass Foo and declare that it implements Xyzzy, and pass it to the template.

Another idea I have (in the context of C++ concept checking at least) is to get rid of the distinction between a class and an interface. Any class can be used as an interface. That's because C++ allows declaration and implementation to be separate. Furthermore, when a class is said to implement another class, it does not mean the class is a subclass of the other class. The compiler will only check that the method signatures are congruent.

## Friday, November 25, 2011

### Non-stop high speed rail

The basic premise is to find out how to let passenger board and unboard a moving train, without the train stopping. Here I'm tracking the ideas.
• 2007/4/18, top boarding design by 陳建軍, titled "不停站可上下乘客的火車."
• 2011/6/21, side boarding design by Paul Priestman, titled "Moving Platform."
• 2011/11/24, cloverleaf (rear detach, front attach) design by a Slashdot commentator.
All of the above designs suffer poor aerodynamics.

Inspired by a Mythbuster study that a vehicle drafting behind a "big rig" improves mileage, I would suggest a rear boarding design where a passenger car will approach the speed of a moving high speed rail train from behind, dock for a few minutes while boarding and unboarding passengers, and detach from the train again from the rear.

I believe this docking mechanism will be safer and also energy efficient.

## Thursday, October 27, 2011

### Debugging static object destruction order

If you like to avoid the “static initialization order fiasco” by using construct on first use idiom but with a static object instead of a static object pointer, you might also run into static destruction order fiasco. One solution is to reorder the static constructors so that the destructors are called in the right order, by forcing the static object to be referenced before its first use.

In order to do decide how to reorder the destruction sequence, we need to tell how it will be carried out. The static destructors are called in reverse order the static constructors are called. Leveraging the fact that destructors are registered using __cxa_atexit (this is part of Itanium C++ ABI, which is adopted by Linux Standard Base and implemented in GCC) as soon as the constructor finishes, we can set breakpoint for __cxa_atexit to see the constructor order, which in turn means the destructors will be called in reverse.

The gdb recipe here will print the name of the function that contains the static object whenever the static object is constructed, but will continue execution.
(gdb) b __cxa_atexit  # assume this is breakpoint n
# set commands to automatically execute for breakpoint n
(gdb) commands n
>where 2
>continue
>end

(gdb) b __cxa_finalize
This allows you to generate a log of construction order without breaking and analyze it offline.

## Friday, October 7, 2011

### Xcode 4.1 for Lion and Apple App Store archive format

Today I had the (dis)pleasure of downloading Xcode 4.1 for Lion, which turns out to be packaged for App Store. I only have Leopard, so this is how I extracted the file content.

• First I opened up xcode_4.1_for_lion.dmg, which is a regular hard disk image file.
• After mounting, there is a InstallXcodeLion.pkg which turns out to be an xar file. To extract, run xar -x -f /Volumes/Install Xcode/InstallXcodeLion.pkg in an empty directory. I think this is the file that you would have downloaded over App Store.
• After extraction, there are a few more files.
• Distribution - an XML manifest for the App Store. Contains some JavaScript code that is used for the installer. Who'd have thought that Mac OS X installers use JavaScript for scripting?
• InstallXcodeLion.pkg - contains yet another package for installation.
• Resources - the license file and localization.
• Inside InstallXcodeLion.pkg, there is a file Payload which is a cpio file. To extract, run cpio -i -F /path/to/InstallXcodeLion.pkg/Payload in another empty directory.
• After extraction, you get an Applications folder with Install Xcode.app. This is supposedly what App Store saves to the Applications folder.
• Attempting to run the mach-O binary in the command line results in: dyld: unknown required load command 0x80000022. It is LC_DYLD_INFO_ONLY.
• All the sub-packages can be found in the Contents/Resources/Packages. They are xar packages, but with Bom, PackageInfo, Payload, and Scripts immediately inside. These .pkg files can actually be installed with the Installer that comes with Leopard. However, it is not clear if the binaries installed by these packages will suffer the LC_DYLD_INFO_ONLY issue.
This shows that with some script kiddie magic, you can install Xcode 4.1 on Leopard. Whether it will actually run or not is another thing.

## Saturday, August 13, 2011

### Remote Admin of Server in Closet

Suppose you have a server tucked away in a closet (or away in a data center). A common issue is, the server is wedged, and you need to power off and/or configure BIOS settings remotely, without being physically present at the server rack.

One sanctioned commercial solution is the Cyclades ACS serial console servers (8-port ~$2,000 and 32-port ~$3,000) with separate power strip relay like Startech rackmount server switch (8 outlet ~$440). What if you're just a hobbyist wanting to put some servers away in the closet or basement, but don't want to have to physically be present by the server when there is a problem? Here is my proposed solution. • A power over ethernet switch, like Netgear ProSafe GS108T (~$100). The main reason to have power over ethernet is for interfacing with the serial-to-ethernet adapters.
• A Netburner serial-to-ethernet adapter (~$30) with a power over ethernet module (~$8). The adapter has additional analog or digital ports that can be used to drive a relay.
• A pair of solid state relay rated for 120VAC at least 10A input such as Opto 22 120D10 (~$22). This is a SPST (single pole single throw) relay, and will need a pair of them to control one NEMA 5 outlet which has two contacts. Also need to connect these contacts to NEMA 5 plugs (could use a cable like this one, ~$5). Consider using a blade crimp connector.
If you have 8 servers or more, you might as well go for the Startech rackmount server switch.

## Saturday, July 30, 2011

### KISSmetrics and life of an ETag

When I read Researchers Expose Cunning Online Tracking Service That Can’t Be Dodged on Slashdot, many commentators there thought disabling JavaScript could prevent tracking because the disclosure on how KISSmetrics works mentions serving two pieces of JavaScript file. However, JavaScript here is the red herring. The magic happens with ETag, a cache validation aspect of the web whose intended use is to speed up the loading of websites that you have already visited by downloading only content that has been updated since your last visit.

When a web browser first visits a site requesting an URL, the server responds with an entity tag (ETag) for that URL that can be used to uniquely identify the version of the URL served. Whenever the resource at a URL is modified, it is guaranteed that its ETag will change. Subsequent browser requests would include a conditional query, essentially telling the server "if this URL has not changed (same ETag) then don't bother sending me the resource." A web browser cache would remember the ETag for as long as the resource is in the cache.

NoScript will not block ETag. In fact, an ETag can be attached to any resource, an HTML file, an image, etc. The browser's incognito mode may not be sufficient if it shares the same browser cache with non-incognito mode content (as it will send the same ETag). The only way to disable ETag tracking is to disable/clear browser cache, but this too may not be sufficient (more about this later).

KISSmetrics uses a combination of techniques. To find out how, I hand-crafted an HTTP request as follows, and saved it in a text file in DOS line ending ("\n\r"). Notice the file needs to have a trailing blank line, which marks the end of the HTTP request header.
$cat i.txt GET /i.js HTTP/1.1 Host: i.kissmetrics.com$ hexdump -C i.txt
00000000  47 45 54 20 2f 69 2e 6a  73 20 48 54 54 50 2f 31  |GET /i.js HTTP/1|
00000010  2e 31 0d 0a 48 6f 73 74  3a 20 69 2e 6b 69 73 73  |.1..Host: i.kiss|
00000020  6d 65 74 72 69 63 73 2e  63 6f 6d 0d 0a 0d 0a     |metrics.com....|
0000002f

Now make a request to be tracked.
$cat i.txt | nc i.kissmetrics.com 80 HTTP/1.1 200 OK Cache-Control: max-age=864000000, public Date: Sat, 30 Jul 2011 19:49:32 GMT ETag: "xy5cdaPdlMSI4u2xv8rndfudaAE" Expires: Wed, 15 Dec 2038 19:49:32 GMT Last-Modified: Sat, 30 Jul 2011 18:49:32 GMT P3P: CP="NOI CURa ADMa DEVa TAIa OUR IND UNI INT" Server: nginx Set-Cookie: _km_cid=xy5cdaPdlMSI4u2xv8rndfudaAE;expires=Wed, 15 Dec 2038 19:49:32 GMT;path=/; Content-Type: application/x-javascript Content-Length: 79 var KMCID='xy5cdaPdlMSI4u2xv8rndfudaAE';if(typeof(_kmil) == 'function')_kmil();  Notice that the same identity is presented as ETag, a cookie, as well as a variable in the JavaScript. To my surprise, if I run the same command again, I get the same ETag. $ cat i.txt| nc i.kissmetrics.com 80
HTTP/1.1 200 OK
Cache-Control: max-age=864000000, public
Date: Sat, 30 Jul 2011 19:49:32 GMT
ETag: "xy5cdaPdlMSI4u2xv8rndfudaAE"
Expires: Wed, 15 Dec 2038 19:49:32 GMT
Last-Modified: Sat, 30 Jul 2011 18:49:32 GMT
P3P: CP="NOI CURa ADMa DEVa TAIa OUR IND UNI INT"
Server: nginx
Age: 298
Content-Type: application/x-javascript
Content-Length: 79

var KMCID='xy5cdaPdlMSI4u2xv8rndfudaAE';if(typeof(_kmil) == 'function')_kmil();

This time, notice that it no longer tries to set a cookie, but it somehow remembers my ETag and sets an age. Running the same command again, I get:
$cat i.txt| nc i.kissmetrics.com 80 HTTP/1.1 200 OK Cache-Control: max-age=864000000, public Date: Sat, 30 Jul 2011 19:49:32 GMT ETag: "xy5cdaPdlMSI4u2xv8rndfudaAE" Expires: Wed, 15 Dec 2038 19:49:32 GMT Last-Modified: Sat, 30 Jul 2011 18:49:32 GMT P3P: CP="NOI CURa ADMa DEVa TAIa OUR IND UNI INT" Server: nginx Age: 542 Content-Type: application/x-javascript Content-Length: 79 var KMCID='xy5cdaPdlMSI4u2xv8rndfudaAE';if(typeof(_kmil) == 'function')_kmil();  Notice the longer age now. Now, why does this result surprise me? If I hand-craft an HTTP request, the request should be perfectly stateless. I am expecting to get a different ETag every time I try the same command. But I am getting the same one every time, as if I'm still being tracked. It turns out there is a co-conspirator. I'm using a mobile wireless connection right now, and there is a transparent proxy between my computer and KISSmetrics. The transparent proxy is part of the network infrastructure to lessen the load of my provider's connection to another network, by sharing a cache among my provider's users. An evidence of the existence of this transparent proxy is the difference in server behavior. If I switch to a network without the transparent proxy, I get this: $ cat i.txt | nc i.kissmetrics.com 80
HTTP/1.1 503 Service Unavailable.
Content-length:0


The web server, nginx, wants the connection to remain open until at least it starts sending the response, but the transparent proxy before did not require this. This is probably a side-effect of nginx HTTP pipelining support. It's not too difficult to workaround this problem, by slightly modifying the command.
$cat i.txt /dev/tty | nc i.kissmetrics.com 80 HTTP/1.1 200 OK Cache-Control: max-age=864000000, public Content-Type: application/x-javascript Date: Sat, 30 Jul 2011 21:55:58 GMT ETag: "ysdfEF8mCndrvOxrcnzF4tysDss" Expires: Wed, 15 Dec 2038 21:55:58 GMT Last-Modified: Sat, 30 Jul 2011 20:55:58 GMT P3P: CP="NOI CURa ADMa DEVa TAIa OUR IND UNI INT" Server: nginx Set-Cookie: _km_cid=ysdfEF8mCndrvOxrcnzF4tysDss;expires=Wed, 15 Dec 2038 21:55:58 GMT;path=/; Content-Length: 79 Connection: keep-alive var KMCID='ysdfEF8mCndrvOxrcnzF4tysDss';if(typeof(_kmil) == 'function')_kmil();  After the server responds, I hit Ctrl-D to end the connection. I now get a fresh tag (as well as a cookie) every time. $ cat i.txt /dev/tty | nc i.kissmetrics.com 80
HTTP/1.1 200 OK
...
ETag: "ikLBYzrQaWhFzc5lsacDhni3ftI"
...
Set-Cookie: _km_cid=ikLBYzrQaWhFzc5lsacDhni3ftI;expires=Wed, 15 Dec 2038 22:03:26 GMT;path=/;
Content-Length: 79
Connection: keep-alive

var KMCID='ikLBYzrQaWhFzc5lsacDhni3ftI';if(typeof(_kmil) == 'function')_kmil();
$cat i.txt /dev/tty | nc i.kissmetrics.com 80 HTTP/1.1 200 OK ... ETag: "fsxiUZH0lIdITI0YA4-uxXslRMQ" ... Set-Cookie: _km_cid=fsxiUZH0lIdITI0YA4-uxXslRMQ;expires=Wed, 15 Dec 2038 22:03:33 GMT;path=/; Content-Length: 79 Connection: keep-alive var KMCID='fsxiUZH0lIdITI0YA4-uxXslRMQ';if(typeof(_kmil) == 'function')_kmil();  I abbreviated the irrelevant headers. Further, by modifying the HTTP request, I could get KISSmetrics to replay the ETag. A cookie is added to the HTTP request: $ cat j.txt
GET /i.js HTTP/1.1
Host: i.kissmetrics.com

$cat j.txt /dev/tty | nc i.kissmetrics.com 80 HTTP/1.1 200 OK Cache-Control: max-age=864000000, public Content-Type: application/x-javascript Date: Sat, 30 Jul 2011 22:08:35 GMT ETag: "fsxiUZH0lIdITI0YA4-uxXslRMQ" Expires: Wed, 15 Dec 2038 22:08:35 GMT Last-Modified: Sat, 30 Jul 2011 21:08:35 GMT P3P: CP="NOI CURa ADMa DEVa TAIa OUR IND UNI INT" Server: nginx Set-Cookie: _km_cid=fsxiUZH0lIdITI0YA4-uxXslRMQ;expires=Wed, 15 Dec 2038 22:08:35 GMT;path=/; Content-Length: 79 Connection: keep-alive var KMCID='fsxiUZH0lIdITI0YA4-uxXslRMQ';if(typeof(_kmil) == 'function')_kmil();  What is interesting is that if I perform the If-None-Match query, KISSmetrics doesn't try to set the cookie back. I thought it should. $ cat k.txt
GET /i.js HTTP/1.1
Host: i.kissmetrics.com
If-None-Match: "fsxiUZH0lIdITI0YA4-uxXslRMQ"

$cat k.txt /dev/tty | nc i.kissmetrics.com 80 HTTP/1.1 304 Not Modified Date: Sat, 30 Jul 2011 22:11:20 GMT Server: nginx Connection: keep-alive  This exercise reveals why ETag is such a clever technique to track visitors. By leveraging the transparent proxy cache, the end user has no option opting out of tracking. In fact, the web browser cache is simply a leaf node of a greater Internet content distribution cache framework. By using ETag, your internet service provider will do the dirty work for KISSmetrics. You can still be tracked through no fault of your web browser. As to who is responsibility for tracking you, the distinction is blurred. To illustrate how the transparent proxy aids tracking, if I connect back to the network with the transparent proxy using cookie replay, the transparent proxy now starts tracking the replayed identity. $ cat j.txt | nc i.kissmetrics.com 80  # Cookie-replayed request.
HTTP/1.1 200 OK
...
ETag: "fsxiUZH0lIdITI0YA4-uxXslRMQ"
...
$cat i.txt | nc i.kissmetrics.com 80 # Untracked request. HTTP/1.1 200 OK ... ETag: "fsxiUZH0lIdITI0YA4-uxXslRMQ" ...  I disconnect and reconnect again. Now I issue the untracked request first, followed by a cookie replay, and by an ETag replay. Notice how the replay is now ignored because the new untracked request is now cached by the transparent proxy. $ cat i.txt | nc i.kissmetrics.com 80  # Untracked request.
HTTP/1.1 200 OK
...
ETag: "Chw55f8kmJAUzkH15o0uP8Qz6i0"
...
$cat j.txt | nc i.kissmetrics.com 80 # Cookie-replayed request. HTTP/1.1 200 OK ... ETag: "Chw55f8kmJAUzkH15o0uP8Qz6i0" ...$ cat k.txt | nc i.kissmetrics.com 80  # ETag-replayed request.
HTTP/1.1 200 OK
...
ETag: "Chw55f8kmJAUzkH15o0uP8Qz6i0"
...

If you want to surf the web without being tracked, you (1) disconnect from the network, (2) reconnect, and (3) prime the transparent proxy's cache with a new identity request; then without clearing browser cache or cookies, you will be issued a new identity. However, it is possible that when the browser presents an old identity alongside the new identity, KISSmetrics can correlate and merge the two identities. It is probably safer to clear the browser cache and cookies just to be sure.

I think this at least marks the beginning of a happy story. Even though the transparent proxy cache built into the network infrastructure by the internet provider facilitates tracking, it is still possible for an end-user to evade the tracking by manipulating the proxy in a certain way.

Finally, the identity here is not really personally identifiable information per-se. To KISSmetrics, it is just a random string that tells them the random string has been seen visiting websites X, Y and Z. Unless you provide personally identifiable information to websites X, Y, or Z, all they know is that the same person has used different internet providers to visit certain websites.

## Tuesday, July 5, 2011

### ISMM 2011 selected paper notes

• Cache Index-Aware Memory Allocation. Allocator of fixed-size headerless objects with sizes that are multiples of cache line size (e.g. object size 128 on cache lines of 64 bytes) may cause certain cache indices to be evicted often when objects uniformly map to the same cache line. Adding spacers (punctuated array) to these objects could break the affinity. NOTE(liulk):
• Cache index conflict should generally happen only at block level if the allocator uses superblocks to manage arenas.
• Knuth boundary tags are natural spacers used for variable sized object allocation, and would not suffer this problem.
• Compartmental Memory Management in a Modern Web Browser. Memory resources allocated for each origin (as in "same origin policy") is stored together in the same heap, and a web browser will have separate heaps for each origin. Although Chrome spawns a process for each tag, but this may not be practical for mobile devices where hardware address space isolation could be expensive. This also shortens garbage collection time because same-origin private heaps do not have reference to objects belonging to another origin.
• Multicore Garbage Collection with Local Heaps. Illustrates how mutation handling makes it hard to separate local and global heaps, and how despite the complicated mechanism, achieves little gain in scalability.
• Iterative Data-parallel Mark&Sweep on a GPU. Highlights challenges specific to GPU, namely the lack of deep recursion due to the relatively small size of video memory (~128MB) with no virtual memory and the large number of threads (1000+).

## Saturday, April 2, 2011

### Factories and rvalue reference

One of the benefits of rvalue references is that it provides perfect forwarding for libraries. One example is the factory. A factory here is taken in a loose sense to mean any function (as well as a constructor) that has to construct another object using a non-default constructor, taking the initializer from the function's argument. An example factory is shown below.
template<class T>
T *factory(const T& x = T()) {
return new T(x);
}
Also, the constructor of std::pair is a factory.

A typical use of the factory is to pass in temporary object as the initializer. For example,
std::complex *p = factory(std::complex(1.0, -1.0));
One issue that factories face when the initializer retains resource is what to do with the resource. The resource could be a memory location, a file in the filesystem, or anything whose explicit ownership needs to be established or resource sharing and leak would be an unwanted consequence.
• Sharing the resource is out of the question, since this would violate ownership invariant.
• Stealing the resource may be desirable but not possible, since x is of type const T& which means you cannot modify x.
• Copying the resource could be expensive, for example if x contains a string or a vector with a lot of data.
Part of this issue could be mitigated by the copy constructor of T itself. It could be the case that T only wants to copy some defining characteristics from the initializer and not the data. For example, if T represents a file, then only the file permissions are copied but not file content; or if T is a string buffer, then only the open mode is copied but not the content of the buffer.

Note that std::stringstream does not have a copy constructor, so we're only talking about a hypothetical class. In the case of std::stringstream, most STL implementation probably did not prohibit copy constructor, so the compiler automatically generates one that could end up performing shallow copy of the underlying buffer. This happens to work for temporary initializer like this:
std::stringstream *p = factory(std::stringstream(...));
But if the initializer is an lvalue, then sharing of the underlying buffer would be undesirable.
std::stringstream ss(...);
std::stringstream *p = factory(ss);

Depending on what stringstream's copy constructor does, *p and ss might end up sharing the same buffer, and mutating ss or *p would make the internal states of the other object inconsistent.

With the introduction of rvalue references, we can now give a strong hint to the factory that the internal resources should be transferred.
template<class T>  // factory using copy semantics
T *factory(const T& x) {
return new T(x);
}

template<class T>  // factory using move semantics
T *factory(T&& x) {
return new T(std::forward<t>(x));
}

Note that without the std::forward<T>, a named rvalue reference is treated as an lvalue reference when used [N1377], and we would end up calling the non-const lvalue reference constructor (i.e. T(T& x)) or fall back to copy constructor (i.e. the const lvalue reference constructor).

Revisiting the factory using scenario above,
std::stringstream *p = factory(std::stringstream(...));
would use the move semantics rvalue reference factory, and
std::stringstream ss(...);
std::stringstream *p = factory(ss);

would use the copy semantics const T& factory.

This is consistent with N1377 move proposal which says that the most common anticipated overload to be,
void foo(const A& t);  // #1
void foo(A&& t);       // #2

Note that the factory examples in rvalue reference proposal [N1690] do not have a const T& overload. This means that the factory will only steal resources because of the move semantics. And that factory(ss) above will steal the underlying buffer of ss and render it unusable.

The conclusion is that factory must overload both const T& and T&&.

This actually works nicely with existing libraries. In pre-C++0x code, the factory will always copy. A preprocessor macro would detect the availability of C++0x features and add the rvalue reference version to support move semantics in factory.

One last word about regular functions with rvalue reference arguments. In N1377, regarding binding temporaries to references, it is mentioned that a value could be implicitly converted to a temporary, and this is why passing a temporary of T to T& is disallowed.
void incr(int& rr) {rr++;}

void g()
{
double ss = 1;
incr(ss);
}

Here, incr(ss) implicitly converts ss to an int, and increment is done on the temporary int value instead. This is properly disallowed in C++. However, in the case of rvalue reference, it is a good idea to make the constructor explicit to disallow implicit conversion.
class Foo {
public:
explicit Foo(Bar b);  // regular constructor
};
Otherwise, a function expecting Foo&& for argument but given a Bar object would implicitly convert Bar to Foo and operate on the temporary Foo. Furthermore, if Foo also has a rvalue reference constructor taking Bar&&, then the temporary Foo would steal all internal resources of Bar, get passed to the function, and self-destruct. In other words, implicit conversion from Bar to Foo would destroy Bar.

This is likely going to be a very common mistake when more people misuse rvalue reference. On the other hand, it would be wise to limit the use of rvalue reference and never use it like lvalue reference even if you declare the constructors to be explicit. In particular, rvalue reference may or may not modify the argument, so it should not be used as an "out" argument.

## Saturday, March 19, 2011

### Wikileaks? Openleaks? Try Patriotleaks!

(This is a fictional article, but I welcome those who turn this fiction into reality.)

Imagine on patriotleaks.us website, it features a blatant American flag in the background, an eagle seal, and a heading that reads, "Patriot Leaks, dedicated to protect United States national interest by revealing dirty leaks about foreign government and institutions." This site will probably not be endorsed by the US government, but it will operate under the legal confines of US law.

Then imagine on aiguoleaks.cn website, it features a red and yellow Chinese flag in the background, the communist seal of hammer and sickle, and a heading that reads, "用爱国主义来揭发外国政府和外国组织丑恶的罪行来捍卫祖国。" (Translation: using patriotism to reveal dirty crimes of foreign governments and foreign organizations in order to protect motherland.)

Then imagine there is the Arabic counterpart, the Russian counterpart, the European counterpart, the British commonwealth counterpart. All of these leaks sites all operate under the legal confines of their own legal system, and would never leak things about their own authority in power.

But they all leak documents about each other like Wikileaks does. Furthermore, they will all mirror each other's leaks except the ones about their respective hosting country. This way, we have redundantly distributed partial mirror of all leaks in the world, but all of them legal in their own jurisdiction.

You can imagine Wikileaks in a way has become the Nordic node of the Patriotleaks network. They have support from Sweden, Belgium, and Iceland. They would not leak documents that hurt the national interest of these countries (and they have not done so as far as I can tell).

This type of leaks network will be difficult to suppress by one government alone.

## Saturday, February 26, 2011

### Redundancy of audio packets over lossy network

I wanted to know what people currently do to stream audio packets over a network with packet loss. There are several options for streaming audio, and they vary in their effectiveness against packet loss.

• AirTunes (aka. RAOP) is the protocol used between iTunes and Airport Express, based on RTSP (for establishing session) and RTP (for transmitting data). My impression is that you could stream MPEG4 AAC and Apple Lossless Audio Codec to Airport Express, though it may be possible to stream uncompressed audio too. Note that audio codec itself introduces algorithmetic delay, since compression is performed by grouping audio data over a time window. It is not clear how the protocol deals with packet loss, since RTP does not define how to deal with it.
• Netjack connects a jack server to a slave. It can stream CELT or raw audio over UDP. CELT is a low-delay audio codec with packet loss concealment. However, a tradeoff of the low-delay is that it could not resolve low-frequency pitch (~100Hz fundamental), and has to rely on a long term predictor. I'm not sure how this affects audio quality in practice. The packet transport does not seem to attempt recovering from packet loss, but instead relies on CELT codec for packet loss concealment. The slaves audio time clock is synchronized to the jack server.
• Netjack2 is really a distributed version of jack, using remote computing nodes to apply computationally intensive audio filtering, instead of playing back sound. They implemented a discovery protocol based on multicast, and audio is transmitted over UDP, with no packet loss handling. When using an audio adapter for streaming audio to a slave sound card device, audio resampling is applied to compensate for clock drift.
• Jacktrip, developed at Stanford CCRMA, connects two jack servers over a network. There doesn't seem to be any time synchronization. Data is transmitted over UDP, and it is possible to transmit redundant data in a rolling window in order to recover from data loss.
• There is a proposal (quick view) in using Reed-Solomon code to transmit data over UDP at Kyushu University.
I'm particularly interested in technique that could deal with wireless network packet loss, which has the following characteristics:
• Wireless packet radio implements low-level retransmission of lost or corrupt packets. On the higher level, this is perceived as high packet delay.
• When packet loss on the higher level does occur, it usually happens in bursts of packets. I estimate that up to 3ms radio gap is possible in normal condition.
Regarding these characteristics, the rolling window redundancy used by Jacktrip would not be effective, since a whole streak of packet loss would not be recoverable under this scheme.

When using Reed-Solomon, if we use byte-sized symbols, then a code block can only be 255 bytes (including parity). If we naively transmit the 255 bytes over UDP as a single packet, it is possible that the whole block would be dropped altogether. But we could transpose the block so that, when transmitting n blocks over m UDP packets, each UDP packet would contain 1/m slice of n blocks. This seems to be the approach taken by the Kyushu University proposal. This scheme, known as Cross-Interleaved Reed-Solomon, is also used on audio CD.

## Tuesday, February 15, 2011

### The real way to disable Adobe Updater from your Mac OS X

Adobe update manager is really annoying, but most instructions on the web to disable it merely tells Adobe Updater not to report updates; the updater still runs silently. The fact that I'm dedicating system resource every now and then so the Adobe Updater can phone home but not tell me to update is not good enough for me. I want Adobe Updater to stop completely.

To stop Adobe Updater completely, one must understand how it gets run in the first place. The updater is launched by a Mac OS X system service called launchd. To launchd, Adobe Updater is a periodic job. The job file is stored under your ~/Library/LaunchAgents folder. The actual file name is suffixed with a number of random characters, but it starts with "com.adobe.ARM" as the prefix. If you look inside the file (it's a plain text file), you'd see that launchd would run the updater at 12600 seconds interval, or 3.5 hours.
To remove, type these commands in a Terminal window:
cd ~/Library/LaunchAgents
launchctl remove basename com.adobe.ARM.* .plist

Basically, the idea is, for each launchd plist file in ~/Library/LaunchAgents that you don't want, run launchctl remove on the job name, which is the same as the plist file name without the .plist suffix, then remove the actual .plist file.

While you are at it, there may be other launchd jobs in ~/Library/LaunchAgents left over from stale applications you might have tried before. Feel free to remove them all.

You're welcome.

Edit (Oct 20, 2012): a couple of readers pointed out in the comment that the launchd namespace used by Adobe Updater is now different. I just installed Adobe (Acrobat) Reader XI and found that the name is still com.adobe.ARM.*, but if you have Creative Suite, it might be com.adobe.AAM.* instead. I don't have Creative Suite so I can't verify that.

Furthermore, it appears that when you set Updater preference in Adobe Reader XI to "Do not download or install updates automatically," it now removes the launchd task as well, which means the launchctl and rm commands would no longer be necessary. Kudos to Adobe for figuring that out!

One reader also pointed out that in his case, the updater is installed in the system-wide location /Library/LaunchAgents. In that case, you will need to run “sudo su -” first and type in your own password to gain root privilege (the prompt changes from “\$” to “#”) before they can be removed. Be careful the commands you enter as root, as a mistake can irreparably damage your system.

Thanks for keeping me updated y'all.

## Friday, February 11, 2011

### How many aligned address space can I allocate via mmap?

Suppose I want to allocate a naturally aligned piece of memory through mmap(), say I want to allocate a 16KB block of memory aligned to 16KB, what is the greatest amount of memory inside the address space I could allocate in that way? What if I want to allocate 4MB aligned to 4MB? Or 16MB aligned to 16MB? The table is the result in number of that sized chunks and the total amount in GB.

OS4KB16KB4MB16MB64MB
Mac OS X 10.5.8900227 (3.43GB)225056 (3.43GB)877 (3.43GB)217 (3.39GB)52 (3.25GB)
Linux 2.6.18 (32-bit)1048173 (4.00GB)262040 (4.00GB)1020 (3.98GB)252 (3.94GB)59 (3.69GB)
Linux 2.6.18 (64-bit)6168666 (23.53GB)1540236 (23.50GB)6016 (23.5GB)1504 (23.5GB)376 (23.5GB)

The machine that ran the 64-bit test is an AMD Opteron 8220 SE with 40 bit physical address lines (1024 GB) and 48 bit virtual address lines (262144 GB). However, I can't seem to actually allocate that much memory at all.

The source of the program can be found here (alignedmmap.c). Note that since the address is never touched, this does not allocate any actual memory, and the program is so tiny that it hasn't loaded very many shared objects besides standard C library.

## Friday, February 4, 2011

### Putting IPv6 to the test

It's been a while since I configured my Airport Express with an IPv6 tunnel to Hurricane Electric's tunnelbroker.net. While this allows me to be IPv6 enabled, I've been wondering what my Internet experience would be like in a world where I can't get IPv4 addresses anymore, now that IPv4 address exhaustion will happen any minute now. Since today I'm sick at home, it's a perfect time to play in between sleep and orange juice.

I turned off IPv4 on my computer. The first surprise is that DNS didn't work. The reason is because, even though I configured the Airport to use Hurricane Electric's IPv6 DNS server 2001:470:20::2, the IPv6 stateless auto-configuration used by Airport Express does not propagate that information, and Airport Express does not in fact provide DHCPv6.

After configuring the DNS manually, it's time to hit Google. All of Google's services work fine as far as I can tell. I can watch YouTube videos. My blog is working, since it's hosted by Google. The bad news is that I could no longer reach most of the search result, which is not a surprise. The search experience is disappointing because I don't know which sites are IPv4 only. For example, when searching for IPv6 stateless auto-configuration, the RFC mirror faqs.org does not work, but ietf.org works. The good news is that the "cache" link allows me to look at the page despite the site being unreachable over IPv6.

Most of the big websites aren't reachable over IPv6: facebook.com, yahoo.com, live.com / bing.com / microsoft.com, wikipedia.org, amazon.com, linkedin.com, ... cnn.com, newegg.com, apple.com, etc. Also, w3.org isn't IPv6 ready, neither is mit.edu (it's not my alma mater, but you'd think they're at the forefront of technology).