In this section:
Background
Many organizations may have specific safety and security concerns or requirements around their test data. If you are developing a banking application, for example, you may source SSNs, account numbers, and other data from a real instance of a database. If you are developing an electronic health record (EHR) application, then HIPPA requirements may extend to the data you are testing with. This is why the ability to mask sensitive data in a simple way is important for testing.
Masking data, though, is not as simple as scrambling values. For instance, APIs may expect data objects that contain specific fields in a specific format and with specific types. Additionally, masked values need to be consistent across fields. If a record is available in two different tables, the masked value must be the consistent.
You can use the Mask Data functionality in Test Data to create a copy of the data set with random values that meet the application’s requirements. Masked data is safe, secure, and can be used in your test scenarios without affecting test results.
Prerequisites
Complete the Capturing and Managing Test Data chapter of this tutorial before beginning this section.
You can alternatively follow this tutorial using an existing data repository.
Understanding the Data Model
The Test Data tab represents the data in a table. You can use this interface to clone, manipulate, and manage the data. But you can click on the Model tab to view a representation of the data that shows how the data objects are related.
You can refer to the Data Modeling Overview documentation for a more detailed explanation of the symbols, shapes, and colors used in the model.
Exploring the Model
- Use the click-wheel on your mouse to zoom in and out of the model.
- Click and drag the to view parts of the model outside of the frame.
- Click on an element to view its relationships to other elements.
For SQL data sets, the element in the center of the model (database icon) represents the query. Each branch from the query represents a SQL template (a collection of SQL commands that return a result) used to build the data set.
In order to retrieve the correct account, ParaBank queries the social security number associated with the account. Click in the Search bar and enter SSN
to see the queries.
In our data set, SSN appears in ResultSet3 and ResultSet5 (not shown).
The result set objects in the model are abstractions created by Test Data to facilitate a better visualization. The result set is a representation of the data returned by the SQL template object ("SQL3" in the image above). You can change the SQL template display name to make searches easier and more readily identify the objects that represent data in your model.
- Click on the Result Set label. The SQL template appears in the sidebar.
- Click the SQL ellipses menu and choose Edit Display Name.
- Specify a new display name when prompted and click Save.
The ResultSet3 object is changed, in addition to the SQL template object.
Next Step
Social security numbers are sensitive, so they should be masked to ensure that we are using safe data in our testing scenarios. This process will be covered in the following sections.
Inferring Constraints
Constraints refers to the features of the data, such as type, maximum/minimum values (integers), allowed characters (strings), etc. When Test Data infers constraints, it processes all the recorded data and sets constraints around each field. This is important not only for masking data properly, but also for generating new data later on.
- Click on ResultSet3 (if you did not rename the object) and click Infer Constraints.
- Confirm that you want to overwrite the default constraints when prompted. Test Data will infer the constraints for all data columns in the results set.
Configuring Mask Settings
The Mask and Generate Settings overlay includes several options for shaping the data.
- Choose the SSN column in ResultSet3.
- Click the ellipses menu and choose Mask and Generate Settings.
- Refer to the Mask and Generation Settings documentation for details about the settings. In this tutorial, we will choose the following settings:
- Mode: Random. We want to randomly replace the values to achieve a realistic value.
- Type: String. Although an SSN value contains numbers, it behaves more like a string in the application.
- Pattern: ###-##-####. This instructs Test Data to use replace instances of SSN values with three digits, a dash, two digits, a dash, and four digits.
- Character map: The character map defines which characters can be used in the generated pattern. It only applies to patterns that use the ampersand (&) character, which are replaced by characters from the character map.
- Click Save.
The SSN column shows the configured mask settings for this result set.
The SSN column in ResultSet5 still needs to be configured. Additionally, the mask needs to be consistent across results to preserve the integrity of the data.
- Choose the SSN column in ResultSet5 and open the Data Constraint Settings from the ellipses menu.
- Click on the Reference field and choose the SSN column configured in ResultSet3 from drop-drown menu.
- Click Save.
A dotted line will be draw in the model connecting the SSN columns from each result set. This indicates that the masked values applied to the ResultSet3 SSN column will also be applied to the ResultSet5 SSN column.
Generating Masked Data
With the settings configured, we can now mask the data.
- Click the Mask Data button in the toolbar.
- Specify a name for the masked data repository when prompted. By default, the name of the original data set is appended with “_masked”.
- Click Mask.
Next, we will verify that a new value was generated.
- Click on the Data tab and choose the masked repository (ParabankDB_masked).
- Click on the data set (parabank-login) and click on the row containing an social security number.
- Expand the record and check the value in the SSN column.
- Return to the data set page and click on another row that has the SSN column. It will have the same value for the social security number.
- Compare these values to the original SSN column value in the ParabankDB repository.
Finally, you can click on the Tasks tab to view the audit trail for compliance purposes.