When a web browser first visits a site requesting an URL, the server responds with an entity tag (ETag) for that URL that can be used to uniquely identify the version of the URL served. Whenever the resource at a URL is modified, it is guaranteed that its ETag will change. Subsequent browser requests would include a conditional query, essentially telling the server "if this URL has not changed (same ETag) then don't bother sending me the resource." A web browser cache would remember the ETag for as long as the resource is in the cache.
NoScript will not block ETag. In fact, an ETag can be attached to any resource, an HTML file, an image, etc. The browser's incognito mode may not be sufficient if it shares the same browser cache with non-incognito mode content (as it will send the same ETag). The only way to disable ETag tracking is to disable/clear browser cache, but this too may not be sufficient (more about this later).
KISSmetrics uses a combination of techniques. To find out how, I hand-crafted an HTTP request as follows, and saved it in a text file in DOS line ending ("\n\r"). Notice the file needs to have a trailing blank line, which marks the end of the HTTP request header.
$ cat i.txt GET /i.js HTTP/1.1 Host: i.kissmetrics.com $ hexdump -C i.txt 00000000 47 45 54 20 2f 69 2e 6a 73 20 48 54 54 50 2f 31 |GET /i.js HTTP/1| 00000010 2e 31 0d 0a 48 6f 73 74 3a 20 69 2e 6b 69 73 73 |.1..Host: i.kiss| 00000020 6d 65 74 72 69 63 73 2e 63 6f 6d 0d 0a 0d 0a |metrics.com....| 0000002fNow make a request to be tracked.
$ cat i.txt | nc i.kissmetrics.com 80 HTTP/1.1 200 OK Cache-Control: max-age=864000000, public Date: Sat, 30 Jul 2011 19:49:32 GMT ETag: "xy5cdaPdlMSI4u2xv8rndfudaAE" Expires: Wed, 15 Dec 2038 19:49:32 GMT Last-Modified: Sat, 30 Jul 2011 18:49:32 GMT P3P: CP="NOI CURa ADMa DEVa TAIa OUR IND UNI INT" Server: nginx Set-Cookie: _km_cid=xy5cdaPdlMSI4u2xv8rndfudaAE;expires=Wed, 15 Dec 2038 19:49:32 GMT;path=/; Content-Type: application/x-javascript Content-Length: 79 var KMCID='xy5cdaPdlMSI4u2xv8rndfudaAE';if(typeof(_kmil) == 'function')_kmil();Notice that the same identity is presented as ETag, a cookie, as well as a variable in the JavaScript. To my surprise, if I run the same command again, I get the same ETag.
$This time, notice that it no longer tries to set a cookie, but it somehow remembers my ETag and sets an age. Running the same command again, I get:cat i.txt| nc i.kissmetrics.com 80 HTTP/1.1 200 OK Cache-Control: max-age=864000000, public Date: Sat, 30 Jul 2011 19:49:32 GMT ETag: "xy5cdaPdlMSI4u2xv8rndfudaAE" Expires: Wed, 15 Dec 2038 19:49:32 GMT Last-Modified: Sat, 30 Jul 2011 18:49:32 GMT P3P: CP="NOI CURa ADMa DEVa TAIa OUR IND UNI INT" Server: nginx Age: 298 Content-Type: application/x-javascript Content-Length: 79 var KMCID='xy5cdaPdlMSI4u2xv8rndfudaAE';if(typeof(_kmil) == 'function')_kmil();
$ cat i.txt| nc i.kissmetrics.com 80 HTTP/1.1 200 OK Cache-Control: max-age=864000000, public Date: Sat, 30 Jul 2011 19:49:32 GMT ETag: "xy5cdaPdlMSI4u2xv8rndfudaAE" Expires: Wed, 15 Dec 2038 19:49:32 GMT Last-Modified: Sat, 30 Jul 2011 18:49:32 GMT P3P: CP="NOI CURa ADMa DEVa TAIa OUR IND UNI INT" Server: nginx Age: 542 Content-Type: application/x-javascript Content-Length: 79 var KMCID='xy5cdaPdlMSI4u2xv8rndfudaAE';if(typeof(_kmil) == 'function')_kmil();Notice the longer age now.
Now, why does this result surprise me? If I hand-craft an HTTP request, the request should be perfectly stateless. I am expecting to get a different ETag every time I try the same command. But I am getting the same one every time, as if I'm still being tracked.
It turns out there is a co-conspirator. I'm using a mobile wireless connection right now, and there is a transparent proxy between my computer and KISSmetrics. The transparent proxy is part of the network infrastructure to lessen the load of my provider's connection to another network, by sharing a cache among my provider's users. An evidence of the existence of this transparent proxy is the difference in server behavior. If I switch to a network without the transparent proxy, I get this:
$ cat i.txt | nc i.kissmetrics.com 80 HTTP/1.1 503 Service Unavailable. Content-length:0The web server, nginx, wants the connection to remain open until at least it starts sending the response, but the transparent proxy before did not require this. This is probably a side-effect of nginx HTTP pipelining support. It's not too difficult to workaround this problem, by slightly modifying the command.
$ cat i.txt /dev/tty | nc i.kissmetrics.com 80 HTTP/1.1 200 OK Cache-Control: max-age=864000000, public Content-Type: application/x-javascript Date: Sat, 30 Jul 2011 21:55:58 GMT ETag: "ysdfEF8mCndrvOxrcnzF4tysDss" Expires: Wed, 15 Dec 2038 21:55:58 GMT Last-Modified: Sat, 30 Jul 2011 20:55:58 GMT P3P: CP="NOI CURa ADMa DEVa TAIa OUR IND UNI INT" Server: nginx Set-Cookie: _km_cid=ysdfEF8mCndrvOxrcnzF4tysDss;expires=Wed, 15 Dec 2038 21:55:58 GMT;path=/; Content-Length: 79 Connection: keep-alive var KMCID='ysdfEF8mCndrvOxrcnzF4tysDss';if(typeof(_kmil) == 'function')_kmil();After the server responds, I hit Ctrl-D to end the connection. I now get a fresh tag (as well as a cookie) every time.
$ cat i.txt /dev/tty | nc i.kissmetrics.com 80 HTTP/1.1 200 OK ... ETag: "ikLBYzrQaWhFzc5lsacDhni3ftI" ... Set-Cookie: _km_cid=ikLBYzrQaWhFzc5lsacDhni3ftI;expires=Wed, 15 Dec 2038 22:03:26 GMT;path=/; Content-Length: 79 Connection: keep-alive var KMCID='ikLBYzrQaWhFzc5lsacDhni3ftI';if(typeof(_kmil) == 'function')_kmil(); $ cat i.txt /dev/tty | nc i.kissmetrics.com 80 HTTP/1.1 200 OK ... ETag: "fsxiUZH0lIdITI0YA4-uxXslRMQ" ... Set-Cookie: _km_cid=fsxiUZH0lIdITI0YA4-uxXslRMQ;expires=Wed, 15 Dec 2038 22:03:33 GMT;path=/; Content-Length: 79 Connection: keep-alive var KMCID='fsxiUZH0lIdITI0YA4-uxXslRMQ';if(typeof(_kmil) == 'function')_kmil();I abbreviated the irrelevant headers.
Further, by modifying the HTTP request, I could get KISSmetrics to replay the ETag. A cookie is added to the HTTP request:
$ cat j.txt GET /i.js HTTP/1.1 Host: i.kissmetrics.com Cookie: _km_cid=fsxiUZH0lIdITI0YA4-uxXslRMQ $ cat j.txt /dev/tty | nc i.kissmetrics.com 80 HTTP/1.1 200 OK Cache-Control: max-age=864000000, public Content-Type: application/x-javascript Date: Sat, 30 Jul 2011 22:08:35 GMT ETag: "fsxiUZH0lIdITI0YA4-uxXslRMQ" Expires: Wed, 15 Dec 2038 22:08:35 GMT Last-Modified: Sat, 30 Jul 2011 21:08:35 GMT P3P: CP="NOI CURa ADMa DEVa TAIa OUR IND UNI INT" Server: nginx Set-Cookie: _km_cid=fsxiUZH0lIdITI0YA4-uxXslRMQ;expires=Wed, 15 Dec 2038 22:08:35 GMT;path=/; Content-Length: 79 Connection: keep-alive var KMCID='fsxiUZH0lIdITI0YA4-uxXslRMQ';if(typeof(_kmil) == 'function')_kmil();What is interesting is that if I perform the If-None-Match query, KISSmetrics doesn't try to set the cookie back. I thought it should.
$ cat k.txt GET /i.js HTTP/1.1 Host: i.kissmetrics.com If-None-Match: "fsxiUZH0lIdITI0YA4-uxXslRMQ" $ cat k.txt /dev/tty | nc i.kissmetrics.com 80 HTTP/1.1 304 Not Modified Date: Sat, 30 Jul 2011 22:11:20 GMT Server: nginx Connection: keep-aliveThis exercise reveals why ETag is such a clever technique to track visitors. By leveraging the transparent proxy cache, the end user has no option opting out of tracking. In fact, the web browser cache is simply a leaf node of a greater Internet content distribution cache framework. By using ETag, your internet service provider will do the dirty work for KISSmetrics. You can still be tracked through no fault of your web browser. As to who is responsibility for tracking you, the distinction is blurred.
To illustrate how the transparent proxy aids tracking, if I connect back to the network with the transparent proxy using cookie replay, the transparent proxy now starts tracking the replayed identity.
$ cat j.txt | nc i.kissmetrics.com 80 # Cookie-replayed request. HTTP/1.1 200 OK ... ETag: "fsxiUZH0lIdITI0YA4-uxXslRMQ" ... $ cat i.txt | nc i.kissmetrics.com 80 # Untracked request. HTTP/1.1 200 OK ... ETag: "fsxiUZH0lIdITI0YA4-uxXslRMQ" ...I disconnect and reconnect again. Now I issue the untracked request first, followed by a cookie replay, and by an ETag replay. Notice how the replay is now ignored because the new untracked request is now cached by the transparent proxy.
$ cat i.txt | nc i.kissmetrics.com 80 # Untracked request. HTTP/1.1 200 OK ... ETag: "Chw55f8kmJAUzkH15o0uP8Qz6i0" ... $ cat j.txt | nc i.kissmetrics.com 80 # Cookie-replayed request. HTTP/1.1 200 OK ... ETag: "Chw55f8kmJAUzkH15o0uP8Qz6i0" ... $ cat k.txt | nc i.kissmetrics.com 80 # ETag-replayed request. HTTP/1.1 200 OK ... ETag: "Chw55f8kmJAUzkH15o0uP8Qz6i0" ...If you want to surf the web without being tracked, you (1) disconnect from the network, (2) reconnect, and (3) prime the transparent proxy's cache with a new identity request; then without clearing browser cache or cookies, you will be issued a new identity. However, it is possible that when the browser presents an old identity alongside the new identity, KISSmetrics can correlate and merge the two identities. It is probably safer to clear the browser cache and cookies just to be sure.
I think this at least marks the beginning of a happy story. Even though the transparent proxy cache built into the network infrastructure by the internet provider facilitates tracking, it is still possible for an end-user to evade the tracking by manipulating the proxy in a certain way.
Finally, the identity here is not really personally identifiable information per-se. To KISSmetrics, it is just a random string that tells them the random string has been seen visiting websites X, Y and Z. Unless you provide personally identifiable information to websites X, Y, or Z, all they know is that the same person has used different internet providers to visit certain websites.