Upcoming Webinar: How Deepgram Used AirOps + Koala to 24X Traffic and Convert to Opps πŸ’°

Check it out!
Blog/Product

Part 1: Website deanonymization: the definitive technical guide

Profile picture for Tido Carriero

by Tido Carriero

Β·Friday, March 22nd, 2024

In our four-part series, 'Under the Hood: Koala's Data Platform,' we're sharing our insights on the key ingredients necessary for a world-class first-party intent solution and strategies to help sales and marketing teams get the most out of their first-party website traffic.


Website deanonymization: the definitive technical guide

Most B2B websites get tens-to-hundreds of thousands of monthly visitors. A simple count of the visitors that you'd see in an analytics tool may be interesting for marketing teams to understand how traffic is growing or what sources it's coming from, but in order for this traffic to be actionable for a sales team, you'll want to deanonymize the traffic.

In this post, we'll walk you through the technical details of website deanonymization – how it works, Privacy 101, best practices for sales vs. marketing use cases, layering in first-party identity data, and resolving identity across your different data sources.

How it works

When someone first shows up on a website, the only data you have is an IP address. If you're following privacy laws, the best thing you can do with an IP address is to look up the company it’s associated with. There are many tools that do this, but we chose to partner with Clearbit Reveal, which we found to be the best-in-class IP tool. All IP-to-company tools leverage probabilistic matching, since IPs are constantly changing, as are the computers connected to each network.

While each IP-to-company data set is a bit different, these datasets are typically built by joining together from Internet Service Providers (eg., which companies have which IP ranges), data from browser extensions (eg., an Adblocker tool), data from their own ecosystem (eg., Clearbit has a free Chrome Extension), and other datasets. Critically, this data is intentionally anonymized so that no Personally Identifiable Information (PII) can be found.

Match rates vary, but typically about 65% of your traffic can be deanonymized in this way.

Privacy 101

There are new tools coming into the market that offer a "magic database" of visitor-level identity without them giving you an email address. They "achieve GDPR compliance" by ignoring all EU traffic (ie., it would be illegal to do this in the EU, so they scope it to the US). While these tools claim to be legal in the US, we don't believe this is a privacy-friendly practice that will sustain in the long-term, as this is against the spirit of both GDPR and CCPA. This technique also relies on the shaky technical foundation of 3rd party cookie pooling, which browsers are making every effort to make go away.

Finally, the most critical point for you: these tools use a "give to get" data model β€” as you install their tracking, you are also consenting to give your customers' PII back to their data graph. If you do want to use one of these tools, you should update your privacy policy to disclose that you'll be giving your customers' end-user data to one of these tools. These tools typically have recommended language to add to the privacy policy, so you'll want to do that before installing the tool. You may be obligated to notify all customers of your updated privacy policy as well.

Best practices for Sales vs. Marketing use cases

Because all IP-to-company data sets use a probabilistic model, there's an inherent tradeoff between optimizing for match rate vs. match accuracy. We've noticed that marketing teams and sales teams have pretty different viewpoints about whether or not a "false positive" match is okay:

  • Marketing teams tend to be more willing to accept false positives because they can still drive blended advertising performance even if the audience isn't perfect
  • Sales teams tend to distrust datasets with false positives because time spent chasing leads that aren't actually real quickly destroys trust in the data.

If you look at ABM tools, they tend to be procured by Marketing teams (and configured to match as broadly as possible) and then pushed on the Sales teams, but Sales teams don't really trust the data.

While there's no one-size-fits-all answer, you likely will want to tune your willingness for false positives (Clearbit Reveal gives a range of Low, Medium, High and Very High confidence out of their ML model). At Koala, we primarily use Koala for a sales use-case and thus our tolerance for false positives is very low, so we only accept matches with a "Very High" confidence. (We don't want our team alerted on things that could be false positives and are happy to trade off some potential matches for that.)

First-party identity graph

There is a data source that is 100% privacy-friendly, much more accurate than an IP-to-account match, gives you person-level identity, and every company has access to: the first-party id graph that your users give you.

There are three primary ways that someone can identify themselves on your website – all of these are opportunities to "upgrade" your anonymous profile to a known user.

  • Form-fills. When someone fills out one of your forms, Koala can automatically detect that and use that to upgrade the profile.
  • Application sign-up / sign-in. When someone logs into your application, they are giving you their email (note: this may not actually happen via a form-fill if they are using SSO/Google OAuth).
  • Email click-through. When someone clicks on the links in your emails, you can insert an identifier in the URL (eg., getkoala.com/pricing?ko_e=[email protected]) and use that to identify the user. Funny bug: there are clients such as Microsoft Email Advanced Threat Protection that will obfuscate the email parameter in your link, so we've needed to find a way to detect when that's happening and ignore these emails.

Resolving identity

If you have a single profile (cookie ID), you may receive several different pieces of identity information from different sources (your IP-to-account service, your form-fills, an application login, etc.). For this, you'll need an identity resolution algorithm. This may sound like a tiny edge case, but it actually comes up all the time! Here are a few examples:

  • If I have an anonymous IP match and a known work email, I want the work email
  • If someone submits both a work and a personal email into different forms, I want the work email
  • If I have a form-fill and a separate email that the user logged in with, I want the user login email (these tend to be more verified)

We find that sweating these details around identity is one of Koala's most noticeable differences. Note the importance of the data model here β€” without modeling it correctly as profiles in advance, you aren't able to really solve this identity resolution problem fully. Legacy ABMs have known about this problem for years but due to the limitation in the core data model haven't been able to adapt.

This is the most surprising thing I've learned since day 1 of Koala: for a PLG company, 75-80% of active traffic on the marketing website on any given day is prospect-level identified traffic (ie., they've done some kind of identification in the past). If non-PLG, but we still see identity graphs get up to 30-40% once they are fully hydrated. We constantly hear the objection "ah, I won't install this in my app because those people are already customers and thus not interesting", but not taking full advantage of your first-party identity graph is a mistake (even if you're not PLG, this compounds over time and will be critically helpful for cross-sell, upsell, and managing POCs).

Resolving identity

Conclusion

Now that you have a good feel for how deanonymization works, you may want to learn more about how to collect and model this data for B2B use cases, how to find signal(s) from the noise, and how to incorporate this intent data into the rest of your rev ops infrastructure.


In this 4-part series, we’re exploring the key ingredients necessary for a world-class first-party intent solution and strategies to help sales and marketing teams get the most out of their first-party website traffic. Go further and explore the next articles in our series here: