I am sometimes asked by work colleagues and clients for comparisons between different web analytics products – Which product should be used and how do they differ in the way that they compile data? As a result, I discuss below the differences between dynamic tracking and log file analysis and the pros and cons of implementing each method.
Dynamic tracking products, such as Google Analytics, may produce different traffic statistics compared to a log file analysis tool. This is because Google Analytics uses client-side code to gather information, whereas most log file analysis products contain only server-side information. By gathering data directly from user browsers rather than log files generated from web servers, the results can differ dramatically.
Below is only a basic summary of the two methods as there are many more features of web analytics that could be discussed in more detail but this will give you a general understanding as to how different methods are used.
Tracking Code
Google Analytics Tracking Code (GATC) is pasted in to each HTML page within a website. This code is a combination of HTML and JavaScript which is used to track page views and other traffic data. GATC is usually placed at the bottom of each pages code (directly before the closing tag) but it is recommended that the code is placed within the header to avoid any possible issues with a page loading at a slower rate, therefore ensuring that the code is executed correctly. Discrepancies in results may arise at this early stage of implementation as the site owner may not paste the tracking code in to all site pages, but log-file analysis tools usually provide statistics for all pages unless configured otherwise.
JavaScript, Cookies and Cached Pages
Google Analytics uses cookies in order to track visitor activity. These cookies hold a unique visitor ID and are considered safe and non-intrusive by most internet users today, but many people block cookies from being set by their web browsers to prevent personal data from being captured or reported on. A user who deletes their cookies will still be tracked by Google Analytics, but they will be identified as a new visitor to the site leading to inaccurate session results.
Alternatively, log file analysis products use a visitor’s IP address to track user sessions. This can be very unreliable since two or more visitors can share an IP address and it also makes it more difficult to determine whether a user has previously visited the site.
If JavaScript has been disabled on a users’ browser, Google Analytics will be unable to track activity because the GATC cannot be executed. However, log file analysis products are unable to report on cached pages correctly, leading to a significant undercounting of page views. This is because cached pages are saved on a users’ local machine and so are not served by the web server. Google Analytics detects all displayed pages regardless of the source (as long as the visitor is connected to the Internet) resulting in more accurate page-view counts.
Robots
Internet bots, also known as web robots, are software applications that run automated tasks over the internet and scan site content. Since robots are not actual users, their activities need to be excluded from web analytics results. Log file analysis products find this difficult as they need to know about the robot in order to detect it, but there are thousands of robots and new ones appear on a daily basis. Google Analytics does not have this problem as robots do not execute JavaScript and therefore, their activity is not included in its reporting.
Summary
It is important to understand that any web analytics report should be considered like a survey as all statistical products are rarely able to track 100% of site visitors. The reasons for this conclusion range from browsers that block JavaScript to deleted cookies, cached pages and robot tracking. Log file analysis products record every time a file is requested, regardless of who requests it, whereas Google Analytics is a more accurate solution as it handles sessions more reliably and consistently even though it may miss some visits.
