Privacy and data accuracy are two major issues and concerns of web analytics. In most cases, these two issues are related. Many privacy settings affect data tracking and collection accuracy. Concerns about personal privacy have been rising since web analytics became commonly adopted. The use of cookies is a major issue in accuracy and privacy concerns. Cookies may contain privacy information that users do not what to share. For example, in a web beacon tracking method, cookies are used to track customer behavior across different websites. A web beacon is a piece of third party tracking code embedded in a webpage. The same provider collects data, reads cookies, and tracks user behavior across several domains and websites. As soon as the first web beacon is displayed on a system, a unique number is generated and saved in a cookie file on the user’s system. When the user visits another website with web beacon from the same provider, the provider reads the cookie and aggregates user’s data and can customize what advertisement to be displayed for this user.

Web analytics largely depends on the use of cookies for data collection and transmission to the server. If cookies are blocked at the client side, then part of the information is missing and will affect the accuracy of web traffic and usage. There are several ways that users can manipulate client application settings to protect their privacy. All major current browsers provide an easy way to delete cookies and prevent third party cookies, first party cookies, or scripting altogether. Users can choose to use the private browsing mode in all three major browsers (Incognito in Google Chrome, InPrivate in Internet Explorer, and Private Browsing in Firefox). To standardize privacy and tracking controls, W3C recommended the use of DNT (Do Not Track) HTTP header (http://www.w3.org/Submission/web-tracking-protection/). A DNT header is a user preference set in a browser. Both the web server and the client JavaScript can read the setting and should not track the user when the DNT option is explicitly set to true. However, this does not force websites to comply, and the service provider may decide not to honor users’ choice. For example, when Microsoft decided to set DNT setting to true by default in IE 10, Yahoo announced that it would ignore IE 10’s DNT settings (Schwartz, 2012). Another issue is identification of sessions and users. A visit/session can consist of multiple user actions and requests. However, HTTP protocol is a stateless protocol, which makes each request and response independent and not related to prior or later requests. This poses difficulty when we want to correctly identify behavioral patterns. Sessionization is an attempt to group requests from each user over a period of one visit. The configuration and definition of sessions will affect the accuracy of metrics like number of visits. The only way to receive accurate visit statistics is to generate new session when a user logs in and to terminate the session after the user logs out or stays idle for a period of time.

A common approach to identify visits is to use IP addresses. But this is not always possible. If visitors come from the same organization and their network uses Port Address Translation, some visitors will be identified by the same public IP address. On the contrary, if a user changes the IP address during the session, a visit can be incorrectly counted as multiple visits. Cookies are also used to identify visits, but as mentioned before, cookies can be deleted and blocked for privacy protection. That impacts the data accuracy as well.

Web browser and proxy caching influence the accuracy of log file analysis. Caching is important for user experience and effective use of resources. However, it changes host and visit tracking data. If a proxy is used, then content might be cached and reused for subsequent user visits. Other issues may include tracking code configuration and setup, incorrect setting of tracking codes, especially in page tagging methods. Some factors include missing tags and improper placement of tags. JavaScript, AJAX in particular, is used to create more dynamic, more powerful, and easier to use websites. However, a browser delays rendering any content that follows a script tag until that script has been downloaded, parsed and executed. This delay skews the user engagement statistics