Background
“Bot traffic” is a fairly generic term for any unwanted traffic; the reason for using the word “bot” (a contraction of the word “robot”) is that it is typically automated traffic: it’s hard for a non-automated source (or “human”) to generate enough traffic to bother an enterprise website’s statistics.
There is a number of features and methods in Adobe Analytics for dealing with bot traffic. Some capabilities stop unwanted traffic before it gets into your “real” data, and others help to filter it out after the fact.
Primary Sources of Bot Traffic
Your company’s website might get hit with bot traffic for any number of reasons; working with financial services and retail sites, the main ones that I’ve come across are:
- Competitor intelligence
- Usually in the form of price-scraping
- This is typically the “biggest” problem (in terms of volume of traffic it represents and the size of the headache it causes)
- Usually regular in scheduling and volume
- Operational analysis
- Testing the site’s availability and performance
- Can be internal and external
- Usually regular in scheduling and volume
- Research
- Occasionally, I have seen significant traffic from IP addresses that link back to universities or other organisations that seem to be research-focussed
- Typically one-off or sporadic
Other Sources of Unwanted Traffic
Two other examples of unwanted traffic that are worth mentioning are:
- Fraudulent traffic
- Technical glitches
Brief descriptions of these categories follow, but it’s enough to say that the approaches described in this article will also work to exclude these types of traffic although the set-up (e.g., defining Segments) will often be more complicated.
Fraudulent Traffic
I’m using “fraudulent” in a broad sense here: basically, on an eCommerce site, any traffic that relates to transactions that the business considers to be invalid. Often, this is orders that have been cancelled after being placed on the website, but before being processed for shipment (for example, an unauthorised reseller trying to purchase all stock of a particular item in a bulk order).
You may think that implementing Refund Tracking handles this scenario, but the type of unwanted transaction described above can sometimes be cancelled in the backend outside the formal business process, meaning it never makes it into the formal Refund Tracking process.
Technical Glitches
Difficult to describe this category fully, as there is an infinite number of possibilities, but think about the impact of a video “Play” button that gets stuck in a loop, or a Page View event being fired every 30 seconds because a particular browser version doesn’t support a specific JavaScript command in the same way as every other browser.
Or consider what would happen if your non-production website sent traffic to your production Report Suite for whatever reason.
Exclusion Techniques
IP Exclusion
This is a “destructive” technique in that it completely blocks traffic from being available for use in your Report Suite.
It is intended (and mostly used) for excluding internal traffic (traffic from your company’s networks and those of your partners and vendors).
It is not suited for dealing with unwanted traffic for a number of reasons, some of which are:
- The IP address/es of the unwanted traffic must be known beforehand, and this isn’t usually the case
- The IP address filtering options are severely restricted – in essence, the only way to represent a range is by substituting one of more of the dotted quads with an asterisk (e.g., “0.0.*.0”, “0.*.*.0”)
- Its destructive nature means you can’t undo a mistake after the fact
Bot Rules
Similar to the IP Exclusion feature, Bot Rules requires up-front knowledge of the characteristics (IP address and/or User Agent string in this case) of the traffic that is unwanted.
However, unlike IP Exclusion (which is destructive), using the Adobe Analytics Bot Rules feature has a semi-destructive outcome: the data is not discarded, but it is held separately from the data that is generally available for reporting and analysis – you will not be able to see it in Adobe Analytics Workspace, but in Adobe Analytics Reports (the “legacy” reporting interface).
The Bot Rules interface
Bot Rules is accessible from the general “Report Suite Manager” area of the Admin Console:
The option Enable IAB Bot Filtering Rules takes care of the everyday bots (known web-crawlers such as search engines). The option to filter against User Agent strings can be used to filter out your company’s own operational bots (such services can include an option to include an identifier in the User Agent string to key off).
Custom rules
The custom rules that can be defined in Bot Rules provide much more granular control for filtering against IP addresses than is available via the IP Exclusion feature previously discussed; this makes Bot Rules a better option for excluding ranges of IP address in particular. Other benefits are:
- Up to 500 custom rules can be created in the Bot Rules interface
- This compares to only 50 rules in IP Exclusion
- Event more than 500 custom bot rules can be created via upload
- The disadvantage of a high number of custom bot rules is the processing overhead, which can affect how quickly data becomes available for reporting and analysis
- The import/export features
- Define bot rules en masse in Excel before importing
- Export pre-defined bot rules from one Report Suite and import them into another
Unsuited for unanticipated unwanted traffic
All of the above aside, the Bot Rules feature doesn’t help with unwanted traffic that was unexpected: traffic is compared against the rules during post-collection processing; after that, if it passes, it is a permanent fixture of the data that is available for reporting and analysis.
Virtual Report Suites
The primary feature of the Adobe Analytics Virtual Report Suite (VRS) functionality is the use of Segments to filter out data. Filtering out unwanted traffic is only one of the many possible applications of Segments in defining a Virtual Report Suite
A future article will explore the advantages and disadvantages of using the VRS feature, but it is sufficient to say that in almost every case, a VRS should be used in preference to a standard Report Suite set-up in order to take advantage of the power of Segments.
Retroactive
For a Digital Analytics Developer more familiar with the concept of a “View” in Google Analytics, it is important to clarify that filtering data using one or more Segments in the configuration of a Virtual Report Suite affects all data not just data this is collected from the point in time that the change is made.
Bot Filtering
The retroactive nature of filtering data using Segments in a VRS is what makes the feature ideal for filtering out traffic for an unanticipated bot – or, indeed, any unwanted traffic regardless of when it started.
Typical Features of Bot Traffic
As a final point, I will note that the vast majority of the unwanted traffic that I have filtered out of Virtual Report Suites has been reliably identifiable using only three technology-related dimensions; it is much less common that I need to employ behaviour-related dimensions or metrics (e.g., Add To Cart activity, Orders).
Those main three dimensions are:
- Domains
- An in-built dimension that reports the organisation or internet service provider (ISP) hosting the traffic
- This dimension is used in practically every “unwanted traffic” Segment that I am required to define
- In almost every case, this is not a provider of “domestic” internet service (e.g., BT or Sky in the UK, Comcast or Verizon in the US), but an organisation such as Amazon (through their AWS platform) or similar
- Operating Systems
- Another dimensions provided out-of-the-box that is derived by Adobe
- Often, the regular, scheduled bot traffic (including traffic that may have been commissioned by your own organisation’s Ops department) uses the Linux operating system
- In other cases, Operating Systems can be used in combination with Browser (see below) to isolate unwanted traffic
- Browser
- Derived by Adobe from the User Agent string
- In many cases, bot traffic is seen using older versions of the popular browsers Google Chrome and Mozilla Firefox
Summary
This article has described bot / other unwanted traffic and some likely sources for an enterprise website.
The obvious exclusion methods (and their limitations) were explored: IP Exclusion and Bot Rules.
The use of a Virtual Report Suite configured with one or more traffic-filtering Segments was put forward as a useful solution, particularly when the retroactive nature of the feature is considered. As much unwanted traffic can be isolated using “technology” dimensions, the main three are listed as well.
Good Article, however, bots have now evolved and got sneaky. I caught them coming in from AWS infrastructure. In days gone past, we could tie them back to IP addresses, block the range and they would stop getting into reporting.
In 2021 things are now a lot more complex and the bots I am seeing can only be detected using cleverer methods.
Adobe has A website bot plugin :
https://experienceleague.adobe.com/docs/analytics/implementation/vars/plugins/websitebot.html?lang=en
In GA Simo O wrote an article about catching them using v3 of reCaptcha( running without user intervention)
Headerless crawler detection is also a topic in itself.