No Cookies, No Problem — Using ETags For User Tracking

Nicolas Hinternesch
Level Up Coding
Published in
7 min readJul 2, 2020

--

Working at a leading international analytics vendor, I have been keeping a close eye on the current crusade of modern web browsers against cookie technology.

Turns out, there is a way to track individual non-signed in users without using cookies. I implemented it. Here is how.

A few quick opening remarks: The whole point of this piece is to spark discussion and awareness in the industry and among users. Personally, I would never advocate for employing these tracking practices and I am glad to be working for an analytics vendor that has always put privacy, transparency, and integrity first. Besides, from a legal perspective, this technique does not circumvent the GDPR or similar privacy laws. Just because ETags are technically not cookies, does not mean they are not creating personal data and are thus not covered within such guidelines.

I built this website as an example. Take a look.

Click through the three pages → Same ID.
Close the browser window and reopen the site → Same ID.
Turn off your computer and come back tomorrow → Same ID.
Check your cookies → The site neither drops nor reads any cookies.
Check the URL → No dubious query strings.

So how can I preserve the ID and know that your specific device is returning to the site without having you log in and without dropping a cookie?

Cookies Are On Their Way Out

If you are a somewhat active internet user, you will have heard about the ongoing controversy regarding browser cookies and how they are being used. At present, cookie technology is increasingly being phased out by browsers and heavily regulated by privacy guidelines like the GDPR or the CCPA. While this development is certainly an important step towards a more privacy-focused internet, it is also taking a huge toll on the core functionality of most websites, their UX, the economic structure of the internet, and the digital analytics industry. While the demise of the browser cookie as a reliable identifier for a returning user is all but certain, there are still other web technologies that rely on storing information on a local machine.

The Role Of Cache

Enter: Cache. In essence, web caching means storing data from the web on your device, so the browser can reuse that data later when the same resource is requested again. For instance, when a user loads a web page for the first time, the server sends back the whole page to the browser. When the page is cached and the user requests the same page again on the following day, the browser remembers it, the server does not have to send it again, and it can be displayed from the browser cache right away. This is much faster and saves bandwidth. In general, caching technology enhances the delivery speed of web content significantly while also reducing the work needed to be done server-side.

Caching can be executed by using ETags. ETags are IDs that are attached to every resource delivered by a server (e.g. a web page or an image). This is how the server knows whether the user has cached the newest version of the resource. When a resource on the server changes, a new ETag ID is generated for this resource.

  • Monday
    User requests a website for the first time → No ETag in the request → Site is sent back with ETag 123 → Site is stored (cached) on the local device
  • Tuesday
    User requests the same site again → ETag 123 is included in the request → The server checks whether the resource has changed (‘Is the ETag ID still the same?’) → If the ETag has not changed, the server instructs the browser to simply use the site that was delivered and cached on Monday → The resource does not have to be sent again, which saves time and bandwidth

Using Cache Technology To Track And Identify Users

While ETags serve a useful purpose when used for caching, the feature can also be hijacked and intentionally misused for user tracking.

Here is how I did it for my example above:

  • I built a website with three pages
  • I embedded the same iFrame on each of the pages. This iFrame is simply a white 1x1 pixel, which is invisible for the user.
  • When this iFrame resource is requested, I am creating a random ID via PHP on the server side. I use this ID to override the ETag ID for the iFrame, which is usually issued automatically.
  • Every time a user requests one of the three pages (and therefore requests that iFrame), my ETag ID is included in the request. Then, I am checking on the server side, if that ID exists or whether this is a first time request without an ETag.
    → If ETag exists: Returning visitor. Keep the ID and send the same one back.
    → If ETag does not exist: New visitor. New ID. From then on, this ID will be included in all request headers of this user’s device on the site.
  • As a last step — here is how this ETag ID finds its way into the analytics:
    I print the ID from the request/response headers in the iFrame on server side. Invisible for the user, this iFrame now contains the user’s ID. Then I pick it up from there on client side via JavaScript and simply include this ID in my analytics tracking request instead of a cookie ID.
ETag Example iFrame
Finding the ETag ID of the iFrame with the Chrome DevTools

How To Prevent ETag Tracking

Preventing ETag tracking can be quite difficult. It does not rely on cookies or local browser storage. The ETag exchange works without JavaScript. And it does not use the user agent.

However, there are a few options for users to protect themselves from ETag tracking:

  • Disable cache in the browser settings
    Careful here — as mentioned above, caching can be very useful and has a lot of advantages.
  • Modify headers with a browser add-on
    While most browsers do not inherently offer the option to modify headers, there are plenty of browser extensions available, such as ModHeader. Why does this work? The ETag functionality relies on request- and response-headers to exchange the ID. For instance, if a user overrides the If-None-Match header to be blank on every request, a new ETag value will be generated on every page request. This prevents the user’s device from being identified.
ModHeader Screenshot ETag Tracking

Why This Is Important

Why am I testing these things? Why am I writing this article? I certainly do not intent to use this at scale. But while ETags can be used for evil, this example proves a larger point: Like most other technology, it is not necessarily harmful by default. It always depends on the application.

I believe that it is important for everyone to be aware that these methods are out there. And that they might be leveraged. There have been quite a few instances in the past of websites using this particular ETag hijacking illegitimately. Some of those cases were settled by lawsuits. And it is only likely that these kinds of methods will be increasingly picked up again by a frightened ad industry, which is watching one of its main cornerstones crumble: The cookie.

One out of many ETag examples on the web can be found in Wendy’s Cookies and Tracking Technologies Policy:

Source: https://www.wendys.com/cookies-and-tracking-2020

The blurb above seems to be an out of the box blueprint that many sites use in their privacy policy. To be clear: This on its own is neither bad nor illegal. Of course, ETag values have to be unique. That is the whole point of them working for the purpose of caching. However, the section is phrased in a very vague and ambiguous way, especially when it comes to stating whether those ETag values are being used for tracking or not. And that is what I find to be troublesome. Upon contacting the Wendy’s privacy team, they responded with a standardized copy & paste email confirming to not use ETags for tracking. The privacy policy, however, leaves that door wide open. And that is what I find worrying.

I believe in open and transparent knowledge transfer in the industry — among analytics vendors, publishers, the advertising industry, and the internet users. In my opinion, the lack of which is one of the main reasons why we ended up in this messy cookie war: The internet ecosystem has always suffered from a lack of transparency. Tech evolves too fast for legislation to keep pace and it is impossible for the general public to understand the ins and outs of web technologies like cookies. And when they are being used inappropriately, the user understandably feels violated. But killing the technology as a result seems like a classic case of fighting the symptoms rather than the cause. The fact that a lot of tech companies misuse technologies like cookies, unfairly vilifies them in the public eye. And in turn leads to disproportionate measures by browsers and legislation. While these measures do a lot of good in terms of personal privacy, they simultaneously harm good and meaningful technological innovation at the same time.

There are always nuances. I strongly believe in the legitimacy and the importance of earnest digital analytics — As long as it is executed with the right level of privacy compliance. What’s next in store when it comes to legitimate visitor identification? ETags surely aren’t sustainable. But one thing is for sure: This industry will never get boring.

— If you want to discuss the example above or if you think you have found the new holy grail for user identification, feel free to reach out. —

--

--