Saturday, July 30, 2011

KISSmetrics and life of an ETag

When I read Researchers Expose Cunning Online Tracking Service That Can’t Be Dodged on Slashdot, many commentators there thought disabling JavaScript could prevent tracking because the disclosure on how KISSmetrics works mentions serving two pieces of JavaScript file. However, JavaScript here is the red herring. The magic happens with ETag, a cache validation aspect of the web whose intended use is to speed up the loading of websites that you have already visited by downloading only content that has been updated since your last visit.

When a web browser first visits a site requesting an URL, the server responds with an entity tag (ETag) for that URL that can be used to uniquely identify the version of the URL served. Whenever the resource at a URL is modified, it is guaranteed that its ETag will change. Subsequent browser requests would include a conditional query, essentially telling the server "if this URL has not changed (same ETag) then don't bother sending me the resource." A web browser cache would remember the ETag for as long as the resource is in the cache.

NoScript will not block ETag. In fact, an ETag can be attached to any resource, an HTML file, an image, etc. The browser's incognito mode may not be sufficient if it shares the same browser cache with non-incognito mode content (as it will send the same ETag). The only way to disable ETag tracking is to disable/clear browser cache, but this too may not be sufficient (more about this later).

KISSmetrics uses a combination of techniques. To find out how, I hand-crafted an HTTP request as follows, and saved it in a text file in DOS line ending ("\n\r"). Notice the file needs to have a trailing blank line, which marks the end of the HTTP request header.
$ cat i.txt
GET /i.js HTTP/1.1
Host: i.kissmetrics.com

$ hexdump -C i.txt
00000000  47 45 54 20 2f 69 2e 6a  73 20 48 54 54 50 2f 31  |GET /i.js HTTP/1|
00000010  2e 31 0d 0a 48 6f 73 74  3a 20 69 2e 6b 69 73 73  |.1..Host: i.kiss|
00000020  6d 65 74 72 69 63 73 2e  63 6f 6d 0d 0a 0d 0a     |metrics.com....|
0000002f
Now make a request to be tracked.
$ cat i.txt | nc i.kissmetrics.com 80
HTTP/1.1 200 OK
Cache-Control: max-age=864000000, public
Date: Sat, 30 Jul 2011 19:49:32 GMT
ETag: "xy5cdaPdlMSI4u2xv8rndfudaAE"
Expires: Wed, 15 Dec 2038 19:49:32 GMT
Last-Modified: Sat, 30 Jul 2011 18:49:32 GMT
P3P: CP="NOI CURa ADMa DEVa TAIa OUR IND UNI INT"
Server: nginx
Set-Cookie: _km_cid=xy5cdaPdlMSI4u2xv8rndfudaAE;expires=Wed, 15 Dec 2038 19:49:32 GMT;path=/;
Content-Type: application/x-javascript
Content-Length: 79

var KMCID='xy5cdaPdlMSI4u2xv8rndfudaAE';if(typeof(_kmil) == 'function')_kmil();
Notice that the same identity is presented as ETag, a cookie, as well as a variable in the JavaScript. To my surprise, if I run the same command again, I get the same ETag.
$ cat i.txt| nc i.kissmetrics.com 80
HTTP/1.1 200 OK
Cache-Control: max-age=864000000, public
Date: Sat, 30 Jul 2011 19:49:32 GMT
ETag: "xy5cdaPdlMSI4u2xv8rndfudaAE"
Expires: Wed, 15 Dec 2038 19:49:32 GMT
Last-Modified: Sat, 30 Jul 2011 18:49:32 GMT
P3P: CP="NOI CURa ADMa DEVa TAIa OUR IND UNI INT"
Server: nginx
Age: 298
Content-Type: application/x-javascript
Content-Length: 79

var KMCID='xy5cdaPdlMSI4u2xv8rndfudaAE';if(typeof(_kmil) == 'function')_kmil();
This time, notice that it no longer tries to set a cookie, but it somehow remembers my ETag and sets an age. Running the same command again, I get:
$ cat i.txt| nc i.kissmetrics.com 80
HTTP/1.1 200 OK
Cache-Control: max-age=864000000, public
Date: Sat, 30 Jul 2011 19:49:32 GMT
ETag: "xy5cdaPdlMSI4u2xv8rndfudaAE"
Expires: Wed, 15 Dec 2038 19:49:32 GMT
Last-Modified: Sat, 30 Jul 2011 18:49:32 GMT
P3P: CP="NOI CURa ADMa DEVa TAIa OUR IND UNI INT"
Server: nginx
Age: 542
Content-Type: application/x-javascript
Content-Length: 79

var KMCID='xy5cdaPdlMSI4u2xv8rndfudaAE';if(typeof(_kmil) == 'function')_kmil();
Notice the longer age now.

Now, why does this result surprise me? If I hand-craft an HTTP request, the request should be perfectly stateless. I am expecting to get a different ETag every time I try the same command. But I am getting the same one every time, as if I'm still being tracked.

It turns out there is a co-conspirator. I'm using a mobile wireless connection right now, and there is a transparent proxy between my computer and KISSmetrics. The transparent proxy is part of the network infrastructure to lessen the load of my provider's connection to another network, by sharing a cache among my provider's users. An evidence of the existence of this transparent proxy is the difference in server behavior. If I switch to a network without the transparent proxy, I get this:
$ cat i.txt | nc i.kissmetrics.com 80
HTTP/1.1 503 Service Unavailable.
Content-length:0

The web server, nginx, wants the connection to remain open until at least it starts sending the response, but the transparent proxy before did not require this. This is probably a side-effect of nginx HTTP pipelining support. It's not too difficult to workaround this problem, by slightly modifying the command.
$ cat i.txt /dev/tty | nc i.kissmetrics.com 80
HTTP/1.1 200 OK
Cache-Control: max-age=864000000, public
Content-Type: application/x-javascript
Date: Sat, 30 Jul 2011 21:55:58 GMT
ETag: "ysdfEF8mCndrvOxrcnzF4tysDss"
Expires: Wed, 15 Dec 2038 21:55:58 GMT
Last-Modified: Sat, 30 Jul 2011 20:55:58 GMT
P3P: CP="NOI CURa ADMa DEVa TAIa OUR IND UNI INT"
Server: nginx
Set-Cookie: _km_cid=ysdfEF8mCndrvOxrcnzF4tysDss;expires=Wed, 15 Dec 2038 21:55:58 GMT;path=/;
Content-Length: 79
Connection: keep-alive

var KMCID='ysdfEF8mCndrvOxrcnzF4tysDss';if(typeof(_kmil) == 'function')_kmil();
After the server responds, I hit Ctrl-D to end the connection. I now get a fresh tag (as well as a cookie) every time.
$ cat i.txt /dev/tty | nc i.kissmetrics.com 80
HTTP/1.1 200 OK
...
ETag: "ikLBYzrQaWhFzc5lsacDhni3ftI"
...
Set-Cookie: _km_cid=ikLBYzrQaWhFzc5lsacDhni3ftI;expires=Wed, 15 Dec 2038 22:03:26 GMT;path=/;
Content-Length: 79
Connection: keep-alive

var KMCID='ikLBYzrQaWhFzc5lsacDhni3ftI';if(typeof(_kmil) == 'function')_kmil();
$ cat i.txt /dev/tty | nc i.kissmetrics.com 80
HTTP/1.1 200 OK
...
ETag: "fsxiUZH0lIdITI0YA4-uxXslRMQ"
...
Set-Cookie: _km_cid=fsxiUZH0lIdITI0YA4-uxXslRMQ;expires=Wed, 15 Dec 2038 22:03:33 GMT;path=/;
Content-Length: 79
Connection: keep-alive

var KMCID='fsxiUZH0lIdITI0YA4-uxXslRMQ';if(typeof(_kmil) == 'function')_kmil();
I abbreviated the irrelevant headers.

Further, by modifying the HTTP request, I could get KISSmetrics to replay the ETag. A cookie is added to the HTTP request:
$ cat j.txt
GET /i.js HTTP/1.1
Host: i.kissmetrics.com
Cookie: _km_cid=fsxiUZH0lIdITI0YA4-uxXslRMQ

$ cat j.txt /dev/tty | nc i.kissmetrics.com 80
HTTP/1.1 200 OK
Cache-Control: max-age=864000000, public
Content-Type: application/x-javascript
Date: Sat, 30 Jul 2011 22:08:35 GMT
ETag: "fsxiUZH0lIdITI0YA4-uxXslRMQ"
Expires: Wed, 15 Dec 2038 22:08:35 GMT
Last-Modified: Sat, 30 Jul 2011 21:08:35 GMT
P3P: CP="NOI CURa ADMa DEVa TAIa OUR IND UNI INT"
Server: nginx
Set-Cookie: _km_cid=fsxiUZH0lIdITI0YA4-uxXslRMQ;expires=Wed, 15 Dec 2038 22:08:35 GMT;path=/;
Content-Length: 79
Connection: keep-alive

var KMCID='fsxiUZH0lIdITI0YA4-uxXslRMQ';if(typeof(_kmil) == 'function')_kmil();
What is interesting is that if I perform the If-None-Match query, KISSmetrics doesn't try to set the cookie back. I thought it should.
$ cat k.txt
GET /i.js HTTP/1.1
Host: i.kissmetrics.com
If-None-Match: "fsxiUZH0lIdITI0YA4-uxXslRMQ"

$ cat k.txt /dev/tty | nc i.kissmetrics.com 80
HTTP/1.1 304 Not Modified
Date: Sat, 30 Jul 2011 22:11:20 GMT
Server: nginx
Connection: keep-alive

This exercise reveals why ETag is such a clever technique to track visitors. By leveraging the transparent proxy cache, the end user has no option opting out of tracking. In fact, the web browser cache is simply a leaf node of a greater Internet content distribution cache framework. By using ETag, your internet service provider will do the dirty work for KISSmetrics. You can still be tracked through no fault of your web browser. As to who is responsibility for tracking you, the distinction is blurred.

To illustrate how the transparent proxy aids tracking, if I connect back to the network with the transparent proxy using cookie replay, the transparent proxy now starts tracking the replayed identity.
$ cat j.txt | nc i.kissmetrics.com 80  # Cookie-replayed request.
HTTP/1.1 200 OK
...
ETag: "fsxiUZH0lIdITI0YA4-uxXslRMQ"
...
$ cat i.txt | nc i.kissmetrics.com 80  # Untracked request.
HTTP/1.1 200 OK
...
ETag: "fsxiUZH0lIdITI0YA4-uxXslRMQ"
...
I disconnect and reconnect again. Now I issue the untracked request first, followed by a cookie replay, and by an ETag replay. Notice how the replay is now ignored because the new untracked request is now cached by the transparent proxy.
$ cat i.txt | nc i.kissmetrics.com 80  # Untracked request.
HTTP/1.1 200 OK
...
ETag: "Chw55f8kmJAUzkH15o0uP8Qz6i0"
...
$ cat j.txt | nc i.kissmetrics.com 80  # Cookie-replayed request.
HTTP/1.1 200 OK
...
ETag: "Chw55f8kmJAUzkH15o0uP8Qz6i0"
...
$ cat k.txt | nc i.kissmetrics.com 80  # ETag-replayed request.
HTTP/1.1 200 OK
...
ETag: "Chw55f8kmJAUzkH15o0uP8Qz6i0"
...
If you want to surf the web without being tracked, you (1) disconnect from the network, (2) reconnect, and (3) prime the transparent proxy's cache with a new identity request; then without clearing browser cache or cookies, you will be issued a new identity. However, it is possible that when the browser presents an old identity alongside the new identity, KISSmetrics can correlate and merge the two identities. It is probably safer to clear the browser cache and cookies just to be sure.

I think this at least marks the beginning of a happy story. Even though the transparent proxy cache built into the network infrastructure by the internet provider facilitates tracking, it is still possible for an end-user to evade the tracking by manipulating the proxy in a certain way.

Finally, the identity here is not really personally identifiable information per-se. To KISSmetrics, it is just a random string that tells them the random string has been seen visiting websites X, Y and Z. Unless you provide personally identifiable information to websites X, Y, or Z, all they know is that the same person has used different internet providers to visit certain websites.

9 comments:

Jakle said...

I'm trying to replicate your analysis, but apparently KISSmetrics doesn't generate any sort of ETag when the i.js is accessed. I tried utilizing NetCat and it gets hung and no response is returned -- until it eventually timed out and command prompt is returned.

Even when I directly load i.js in the browser (Firefox Linux), I get:

if(typeof(_kmil) == 'function')_kmil();

and in Window's FF:

var KMDNTH=1;if(typeof(_kmil) == 'function')_kmil();

Anonymous said...

Some technical details and a proof of concept demo of ETag tracking: http://ochronus.com/tracking-without-cookies/

Alpha Assignment Help said...

This is a great inspiring article.I am pretty much pleased with your work.You can really really helpful information. Keep it up. Keep blogging. Looking to your next post.
best dissertation writing service

Expo 2020 said...

You are doing a tremendous job. It is very enlightening and achieves what it desires. I will make sure that I read all of your blogs in future. It will definitively enhance my knowledge. x28 bus route dubai

Expo 2020 said...

Everything is perfect with this work. The writer has managed to keep the article interesting while discussing some really serious points. This is not easily done. All the best. e307 bus timing

Expo 2020 said...


I usually do not comment on posts but this article stands out from others. I could learn several new things from it. It is great when you are able to gain an understanding on topics that were unknown. Thanks for that. <a href="https://www.thedubai2020.com/dubai-city/”>Dubai</a>

Alpha Assignment Help said...

You have performed a great job on this article. It's very precise and highly qualitative. You've managed to make it readable and easy to read. You have some real writing talent. Thank you so much.
best online assignment help

Academic Assignment Writing said...

I have to say that you are on the right track. This will be loved by several individuals as it is detailed and interesting. All the best for your future work.
Academic Assignment Help

Academic Assignment Writing said...

Please let me know if you're looking for a writer for your site. You have some really great posts and I feel I would be a good asset. If you ever want to take some of the load off, I'd absolutely love to write some material for your blog. Please send me an email if interested. Thank you!
college essay help