| February 2007 |
| Do you test using real production data? Beware using sensitive data for any application development or testing purposes, since lost or stolen information can trigger costly data notifications, regulatory sanctions, and customer fallout. |
| By Mathew Schwartz |
Do you test applications using real, production data?
Anecdotally, many developers and QA testers say they prefer to build and test applications using the real thing: actual customer data.
Such practices, however, can violate a number of data privacy regulations. For example, the 1996 Health Insurance Portability and Accountability Act (HIPAA) mandates companies restrict access to people’s personal health data on a “need to know” basis. Likewise, the Sarbanes-Oxley Act of 2002 requires companies to control access and track changes to systems handling corporate financial information. In addition, over 30 states have passed data breach notification laws requiring companies to notify consumers if their personal information may have been compromised. This includes such things as a person’s name and address, date of birth, social security number, and credit card and bank account numbers.
These regulations make no distinction between production and testing environments. Simply put, the requirements are the same whether an attacker hacks into your e-commerce application, or accesses a database in the testing environment. Given such risks, many companies have decided developers — as well as quality assurance (QA) personnel and database administrators (DBAs) — simply don’t have a “need to know,” and thus shouldn’t have access to any sensitive information. Beyond helping protect sensitive data, the company and its customers, this also protects developers: they’re not culpable in the event of a data leak or security breach.
To ensure applications perform appropriately once they launch, however, developers still need access to “good enough” data to build and test their applications. Accordingly, many organizations are creating homegrown scripts, or purchasing off-the-shelf software, to transform sensitive production data into safe but usable test data.
Data Breach Costs Drive Changes
Beyond complying with regulations, keeping customers happy, and avoiding class action lawsuits, companies also have a financial incentive for keeping sensitive data out of the test environment. Indeed, the actual cost of lost, stolen, or inappropriately accessed data is quite high: an average of $182 per record. That finding comes from the Ponemon Institute, which studied the actual costs incurred by 31 companies after they experienced a data breach. (The total tab for an affected organization ranged from $226,000 to $22 million.) Costs included legal fees, consumer notifications, credit monitoring services, and decreased customer retention and acquisition.
Data breaches can exact more than money. Witness the breach at CardSystems, a company that processed credit card transactions. Attackers stole over 40 million records containing people’s credit card numbers. The records had reportedly been retained by CardSystems for “research purposes,” despite the company being subject to industry regulations expressly forbidding the storage of such data, at least in unencrypted format. The fallout ultimately drove CardSystems out of business.
Not surprisingly, many companies are now taking a closer look at the data their developers use.
Create A Test Data Plan
How can you ensure no sensitive data is being used or stored in your development and testing environments? A recent report from Forrester Research, written by Noel Yuhanna and Carey Schwaber, recommends companies pursue these four steps:
- “Take inventory of your test data requirements.” Which regulations does your company need to comply with, and what data privacy and handling practices does it mandate?
- “Assess risks.” Categorize the sensitivity of each type of production data, and set a threshold — which data can be used extant for testing, or not?
- “Select your strategy for replacing production data when necessary.” For each application, determine how to hide, mask, or otherwise alter sensitive information.
- “Define roles and responsibilities.” Who will enforce test data security practices? Consider creating a test-data czar — one person responsible for providing safe test data to developers, testers, and DBAs. Also limit access to all sensitive information.
Faking It: How to Generate “Good Enough” Test Data
To make production data safe for the test environment, your test-data czar will need to formulate transformation strategies on a per-application basis. This transformation can be difficult, however, since an application often needs to think data is real. For example, an e-commerce application may vet a credit card number to see if it “looks” real. Similar checks may occur for social security numbers, birth dates, driver’s license numbers, addresses, bank accounts, and customer identification numbers.
A variety of techniques exist to create data that’s either fake, or “de-identified” enough to be safe, including:
- Scrambling: Jumble names or numbers to ensure they’re not real, yet close enough to the real thing to work.
- Randomizing: Replace social security, driver’s license, and other numbers with random numbers.
- Encrypting and masking: Encrypt sensitive information, or mask it.
- Concatenation: Retain essential information but strip out, substitute, or randomize the remaining variables using a predefined routine.
- Look-up fields: Substitute an entry or value — such as names or addresses — from a predefined list.
- Propagation: For interdependent fields and parent/child database tables, utilize algorithms and propagate information to ensure relationships, including database key references, remain intact.
How can you apply these techniques? Many companies use scripting tools or their existing test automation tools to obfuscate or replace sensitive information.
For large companies facing especially stringent data regulations and just beginning to deal with test-data problems, however, Forrester’s Yuhanna and Schwaber recommend using off-the-shelf test data generation or masking software, since building it from scratch may take six to nine months. Such tools, they note, are available from Compuware, Datavantage, Global Software Applications, IBM, Princeton Softech, Quest Software, SoftBase Systems, and Worksoft.
Keep a Record
Regardless of your approach, maintain copious records of your test data transformation processes. First, this will help you apply them in a consistent and repeatable manner. Second, if your company should suffer a breach, states’ data notification laws exempt stolen data that was sufficiently de-identified, or encrypted. Hence your records will help demonstrate to auditors that no sensitive information was stolen, enabling your company to avoid a costly data breach notification process, and to save face with customers.
Mathew Schwartz is a freelance business and technology journalist based in Cambridge, Mass.
|
|
|