Session Replay Scripts Jeopardize User Information, GDPR Compliance
Almost 100,000 websites, including 482 of the 50,000 most trafficked, use session replay scripts, many without clear disclosures, providing insight into user behavior but also exposing vital data to external servers.
“You may know that most websites have third-party analytics scripts that record which pages you visit and the searches you make,” Steven Englehardt, Gunes Acar, and Arvind Narayanan, explained in the first of Princeton’s “No Boundaries” series. “But lately, more and more sites use ‘session replay’ scripts. These scripts record your keystrokes, mouse movements, and scrolling behavior, along with the entire contents of the pages you visit, and send them to third-party servers.” The researchers suggested unlike typical analytics services that deliver aggregate statistics, these scripts provide recording and playback of individual browsing sessions, as if someone is looking over the consumer’s shoulder.
The study cast doubt about maintaining the anonymity of the collected data “In fact, some companies allow publishers to explicitly link recordings to a user’s real identity.”
The Princeton research pointed out collection of page content by third-party replay scripts may cause sensitive information such as medical conditions, credit card details and other personal information displayed on a page to leak to the third-party as part of the recording. “This may expose users to identity theft, online scams, and other unwanted behavior. The same is true for the collection of user inputs during checkout and registration processes.”
The replay services offer a combination of manual and automatic redaction tools that allow publishers to exclude sensitive information from recordings. However, to avoid leaks, publishers must diligently check and scrub all pages, which display or accept user information. For dynamically generated sites, this process involves inspecting the underlying web application’s server-side code. Further, this process needs repeating every time an update or a change to the site or web application takes place.
San Francisco-based digital threat management firm RiskIQ just released a follow-up blog, exploring how these scripts also put companies at risk of violating EU's General Data Protection Regulation, taking effect on May 25, 2018, and revealing its own research. Querying its data, RiskIQ uncovered that the domains of 38 of the top 50 U.S. online retailers contain session replay scripts, especially relevant as consumers take to the web for holiday shopping.
“While great for marketing purposes, using these scripts can be a big problem,” Mike Browning, a manager at RiskIQ, said. On May 25, 2018, the EU’s General Data Protection Regulation goes into effect, which applies to any organization that collects, stores, and uses personal information about an EU citizen. “As part of the regulation’s fairness and transparency guidelines, organizations must clearly state at the point of capture how they’ll be using an individual’s data. Permission to use their data must be explicit and demonstrated through an action such as ticking a box, a significant departure to the ‘opt out’ process most organizations have in place today. Evidence of violations and negligence serves as cause for significant fines.”
Thousands of American companies, including credit unions, that do business with European customers need to reckon with GDPR.
To support GDPR specifications, organizations need a comprehensive understanding of their digital footprint—all of the various internet-exposed assets that belong to them. They must be able to discover which external assets collect personally identifiable information, including a user’s name, phone number, address, social media presence, photos, lifestyle preferences, location data, and even their IP address.
“For multinational companies with expansive web infrastructure, merely compiling and assessing site details is often fraught with gaps and inaccuracies,” Browning said. When looking at 25 of the 50 largest financial institutions in the U.S. (2017), the RiskIQ Threat Research team discovered that 68% had significant security gaps in PII collection.