Você está na página 1de 2

Why Don't The Numbers Match?!?

by Judah Phillips , Tuesday, June 10, 2008 A QUESTION ANY PRACTITIONER OF Internet-based analytics will be asked by many different stakeholders is, "Why don't the numbers match?" Counts of the identically named metrics from ad servers don't match the Web analytics tool, which don't match the for-pay third-party audience measurement tools, which don't match the free audience measurement tools, which never match any of the homegrown internal measurement tools. And none of them ever match each other. So it's a good and certainly valid question to ask. The answers are even fairly easy to understand, but the root causes are often difficult to pinpoint and even harder -- if at all possible -- to remedy. The fact of the matter is that data discrepancies in analytics occur for a multitude of reasons, such as: Different data collection methods. We have a bunch of tools and services that collect Web data using various, non-standardized, proprietary data collection methods. Ad servers use javascript page tags. Many Web analytics tools use page tags too, but it's not uncommon in Web analytics to use additional methods, such as log files or packet sniffers. Or perhaps a combination of these methods, called hybrid data collection. And all the tools have different algorithms for processing the data collected. On the audience measurement side, data is collected from self-selecting panels who install proprietary software (i.e. toolbars and so on) on their computers, perhaps at work or at their university, but most likely at home. Then, the collected data from different panels is rolled-up and combined, and the limited subset of the Internet population that chooses to be monitored, in exchange for some incentive, is inflated and projected to the entire Internet audience using proprietary statistical methods. We also have data collected from a limited set of geographically specific ISPs. And regardless of whether we're talking about audience measurement or Web analytics, the different data collection methods often, but not always, involve cookies and all their inherent issues of cookie deletion.

Unique data models. Ad servers aren't focused on counting page views and the other dimension of Web

analytics (visits, time, and so on). Rather, ad servers focus on serving and counting impressions served (and loads of related derivative calculations, like CTR, CPC, and view-through). Metrics are based on an ad request and an ad code. Ads may or may not be targeted to a page, and instead to various constructs, like a "zone" or "keyword." What that means is that the "page" dimension may not even exist in your ad server's data model. In other words, you aren't looking at impressions measured on a page, but rather at the number of impressions served in a different conceptual construct. That's one of the reasons that people say metrics and ad-serving systems "don't measure the same thing."

Untagged pages. Specific to technologies that collect data or serve ads using javascript page tags, there

are challenges to ensuring and verifying complete coverage of page tags across every page on a site. When the pages aren't all tagged with the different tags for the assorted technologies, guess what? The

numbers won't come close to falling within tolerable variances. And questions and skepticism will ensue.

Non-JS-executing clients and ad blocking software. Let's imagine, for the moment, your site is perfectly

tagged for all technologies, so the numbers between your ad server will be close to your Web analytics system, right? Nope, regardless of data model issues, not all browsers execute javascript -- and many Firefox users have installed Ad Block Plus.

Cookie issues. When you're counting based on cookies, third-party cookies get blocked (often by privacy

software). Many ad servers and Web analytics tools still serve third-party cookies, and many corporations have not tricked out their DNS to accommodate this issue. And we all know how cookie deletion affects unique visitor counts, even if you use first-party cookies.

Many other issues. Latency from visitors moving off the page prior to the tag executing to latency in the call to pick up an ad from a third party while your ad server counts the traffic (so your ad count differs from the agency's count), to refresh rates making it hard to correlate page views and impressions, to no rich media installed and no fallback, to robotic traffic not being filtered from logs or tags, to certain types of user agents (such as mobile devices) not executing javascript... there's a whole host of other factors that cause data discrepancies.

And of course, there's always the nebulous issue around the complete lack of consensus-based, enforceable standards for online measurement. No industry organization can say what vendors or companies "must" do, only what they "should" do. And no industry body is going to get successful companies to change their secret sauce just because they said so... So what's a practitioner to do? Understand the potential sources of discrepancies. Work with your team (from IT to vendors) to prevent and minimize the root causes when possible. Educate your team when discrepancies are not remediable. Ensure you use the different sources of metrics judiciously in the context of your business goals. Finally, realize that none of the tools are more "correct" than any other. All of our analytics tools serve different, and sometimes overlapping, business purposes -- from counting ads, to influencing media buying, to sizing audiences, to measuring business performance, and to optimizing the site. Judah Phillips is the director of Web analytics for Reed Business Information, part of Reed Elsevier Group, PLC. He also blogs at judah.webanalyticsdemystified.com

Metrics Insider for Tuesday, June 10, 2008:

http://blogs.mediapost.com/metrics_insider/?p=67

Você também pode gostar