In this section:
Many organizations may have specific safety and security concerns or requirements around their test data. If you are developing a banking application, for example, you may source SSNs, account numbers, and other data from a real instance of a database. If you are developing an electronic health record (EHR) application, then HIPPA requirements may extend to the data you are testing with. This is why the ability to mask sensitive data in a simple way is important for testing.
Masking data, though, is not as simple as scrambling values. For instance, APIs may expect data objects that contain specific fields in a specific format and with specific types. Additionally, masked values need to be consistent across fields. If a record is available in two different tables, the masked value must be the consistent.
You can use the Mask Data functionality in Test Data to create a copy of the data set with random values that meet the application’s requirements. Masked data is safe, secure, and can be used in your test scenarios without affecting test results.
Complete the Capturing and Managing Test Data chapter of this tutorial before beginning this section.
You can alternatively follow this tutorial using an existing data repository.
The Test Data tab represents the data in a table. You can use this interface to clone, manipulate, and manage the data. But you can click on the Model tab to view a representation of the data that shows how the data objects are related.
You can refer to the Data Modeling Overview documentation for a more detailed explanation of the symbols, shapes, and colors used in the model.
For SQL data sets, the element in the center of the model (database icon) represents the query. Each branch from the query represents a SQL template (a collection of SQL commands that return a result) used to build the data set.
In order to retrieve the correct account, ParaBank queries the social security number associated with the account. Click in the Search bar and enter SSN
to see the queries.
In our data set, SSN appears in ResultSet3 and ResultSet5 (not shown).
The result set objects in the model are abstractions created by Test Data to facilitate a better visualization. The result set is a representation of the data returned by the SQL template object ("SQL3" in the image above). You can change the SQL template display name to make searches easier and more readily identify the objects that represent data in your model.
The ResultSet3 object is changed, in addition to the SQL template object.
Social security numbers are sensitive, so they should be masked to ensure that we are using safe data in our testing scenarios. This process will be covered in the following sections.
Constraints refers to the features of the data, such as type, maximum/minimum values (integers), allowed characters (strings), etc. When Test Data infers constraints, it processes all the recorded data and sets constraints around each field. This is important not only for masking data properly, but also for generating new data later on.
The Mask and Generate Settings overlay includes several options for shaping the data.
The SSN column shows the configured mask settings for this result set.
The SSN column in ResultSet5 still needs to be configured. Additionally, the mask needs to be consistent across results to preserve the integrity of the data.
A dotted line will be draw in the model connecting the SSN columns from each result set. This indicates that the masked values applied to the ResultSet3 SSN column will also be applied to the ResultSet5 SSN column.
With the settings configured, we can now mask the data.
Next, we will verify that a new value was generated.
Finally, you can click on the Tasks tab to view the audit trail for compliance purposes.