Data Collector is the component that accepts reports from testing and code analysis tools. You can configure Data Collector’s default storage and upload limits to meet data requirements for your organization.
In this section:
Configuring the Report Storage Threshold
When Data Collector receives and processes files from analyzers, a report file is processed and moved to the <DTP_DATA_DIR>/data/_DEFAULT_/dc/stored
folder. This folder contains subfolders that are structured according to a YEAR/MONTH/DAY/XXX/YYY format where XXX and YYY are integer values. The values are calculated based on file name.
You can set a maximum size for the stored folder and DTP will automatically prune older entries. This configuration only applies to the XML reports stored in DTP, not the build details retention settings (see Configuring Build Details Retention Settings).
Open the DCConfig.xml file located in the
<DTP_DATA_DIR>/grs/config/
directory and locate the following entries:<stored-folder> <max-size>20</max-size> <size-to-clean>5</size-to-clean> </stored-folder>
- Change the value for the
<max-size>
property (in GB) to an acceptable maximum storage capacity; the default is 50 GB. - Change the value for the
<size-to-clean>
property (in GB) to the amount of data you want pruned when the maximum capacity is reached; the default is 5 GB. - Save the file.
If these entries do not appear in the DCConfig.xml file, then the defaults will be used.
Configuring the Maximum Size of Report Uploads
By default, the maximum size of the XML report file that Data Collector can accept is 512M. You can change the limit by modifying JAVA_DC_CONFIG_ARGS parameter located in the <DTP_INSTALL>/bin/variables
file. For example:
JAVA_DC_CONFIG_ARGS=" -Dcom.parasoft.sdm.api.rawstorage.datacollector.uploadMaxSize=1024"
Configuring Data Collector to Log User Information with DTP Requests and Responses
If you are upgrading from DTP 2022.2 or earlier, you can configure Tomcat to log user information along with each request and response. This is necessary for some users to be in compliance with certain regulations. This is unnecessary for new installs of DTP 2023.1 or later.
To configure Data Collector to log this information from DTP, you need to modify the DCServerConfig.xml file located in <DTP_DATA_DIR>/conf/
by inserting the element <access-log-pattern>
, which is not present by default. For example:
<access-log-pattern>%t %s %m %U %H %u %sid %{local}a:%{local}p %{remote}a:%{remote}p %{ms}T %{X-Forwarded-For}i %{User-Agent}i %{Referer}i -</access-log-pattern>
Standard Jetty request log format codes are supported. See https://www.eclipse.org/jetty/javadoc/jetty-9/org/eclipse/jetty/server/CustomRequestLog.html for more information. Data Collector adds the custom format code %sid
to represent the session ID.
These logs can be found in the dc_jetty_request.log file located in the <DTP_DATA_DIR>/logs/
directory.
Configuring Build Details Retention Settings
By default, Data Collector stores ten builds' worth of metrics data and two builds' worth of test results and test coverage data in the database (see Default Data Retention Settings). The data related to these practices are referred to as "build details" and are used to populate their respective explorer views. When a new report is sent to DTP, the details for the oldest relevant build are deleted if the limit is exceeded. Each build details type, however, is stored independently so that exceeding a build details type's limit does not affect other build details for the same build. Static analysis details are always stored in the database and are not configurable.
In the following example, two builds' worth of test data have been stored and are available from the Test Trend widget drill-down report:
You can configure the number of build's worth of data retained for each practice (see Configuration). Raise the limit if you want Data Collector to retain additional build's worth of details when new reports are sent to DTP. Build details can consume a significant amount of disk space, so if disk space is an issue, consider reducing the limit. At least two builds' worth of details is required for DTP to display meaningful information.
About Resource Coverage Details
If a filter is configured to show data according to resource groups, then resource coverage details are also required in order to view data for a single build, as well as to view the build in coverage trend widgets. A resource group is a collection of files and/or folders defined by a set of one or more ANT file patterns. They enable you to view software quality information for specific parts of the code (see Adding Resource Groups to Projects).
Coverage refers to method-level coverage
DTP collects coverage at the method level and at the resource level. Method-level coverage data is required to view coverage in the Coverage Explorer. Unless explicitly stated, "coverage" in this documentation means method-level coverage.
If your filter does not use resource groups, then you can ignore the settings for resource coverage details.
Configuration Hierarchy
- When Data Collector receives new results from a Parasoft tool, it first checks the build details storage settings for the project (see Configuring Build Details Settings).
- If the project build details settings have not been configured, Data Collector uses the global build details settings (see Configuration).
If neither the project, nor the Data Collector's build details settings have been configured, the limit for test and coverage details will be defined by the value set with the following Java argument:
-Dcom.parasoft.sdm.dc.build.details.to.keep
The argument is specified in the variables file located in the
<DTP_INSTALL>/bin
directory. There will be no limit for the metrics or resource coverage build details. This option is deprecated and may be removed in future versions.
Default Data Retention Settings
The amount of data stored is based on the number of historical builds, as opposed to a hard limit, such as gigabytes. By default, DTP keeps the following data:
- Unit test data: two (2) historical builds
- Coverage data: two (2) historical builds
- Resource coverage data: ten (10) historical builds
- Metrics data: ten (10) historical builds.
Configuration
To configure the data retention settings:
- Stop the Data Collector service. See Stopping DTP Services.
Open the DCConfig.xml configuration file located in the
<DTP_DATA_DIR>/grs/config/
directory and locate the<details-retention-builds-count>
group of settings:<!-- <details-retention-builds-count> <tests>2</tests> <coverage>2</coverage> <resource-coverage>10</resource-coverage> <metrics>8</metrics> </details-retention-builds-count>-->
Uncomment the settings and specify how much data should be stored for each practice. The values refer to number of builds. Specifying
1
, for example, means storing one build worth of data for the practice.<details-retention-builds-count> <tests>3</tests> <coverage>4</coverage> <resource-coverage>12</resource-coverage> <metrics>9</metrics> </details-retention-builds-count>
- Save the file and restart Data Collector and DTP Server.
Configuring Test Failures Threshold
By default, Data Collector will reject reports that contain more than 5000 reported test failures. You can change the limit by adding the following Data Collector JVM argument:
-Dcom.parasoft.sdm.rawstorage.failures.limit=5000
For Linux installations, you can modify the following string located in the variables file located in the <DTP_INSTALL>/bin/
directory:
JAVA_DC_CONFIG_ARGS=" -Dcom.parasoft.sdm.rawstorage.failures.limit=5000"
If a negative value is provided, Data Collector will not reject reports due to number of test failures.
The variables file is overwritten during an upgrade, so you will need to reapply the configuration setting after upgrading DTP.
Configuring Max Traffic Response from SOAtest
Reports sent to DTP from SOAtest may include request response messages. By default, Data Collector will accept message responses that are 1 million bytes, but you can specify a limit by setting the following Data Collector JVM argument:
-Dcom.parasoft.sdm.dc.traffic.max.length=<BYTES>
Configuring Automatic Project Creation
By default, when Data Collector processes data referencing a project that does not exist in DTP, the project is automatically created (assuming the user that sends the data has sufficient permissions in DTP to create the project) before Data Collector sends the data to DTP. You can enable/disable this functionality by doing the following:
Open the DCConfig.xml file located in the
<DTP_DATA_DIR>/grs/config/
directory and locate the<auto-create-projects>
element:<auto-create-projects>true</auto-create-projects>
- Set the value to
true
to enable or tofalse
to disable automatic project creation.
If the <auto-create-projects>
element is not present in DCConfig.xml, the default behavior will take effect and it will be treated as if the element were set to true
. If you want to disable this functionality, add the <auto-create-projects>
element to the file and set it to false
. In cases where this functionality is disabled, Data Collector will reject reports referencing a project that does not exist.
Configuring the Number of Threads
Data Collector uses multiple threads to process reports that it has received from testing and code analysis tools in order to improve processing time. By default, Data Collector is configured to use two threads.
The configuration can be found under the <concurrency>
element in the DCConfig.xml file located in <DTP_DATA_DIR>/grs/config/
, as shown below:
<concurrency> <max-concurrency>2</max-concurrency> <max-write-conflict-retries>5</max-write-conflict-retries> <blocking blockers="build, coverageTag, runConfiguration"/> </concurrency>
The "blockers" attribute supports a comma-separated list of values:
- build
- coverageTag
- runConfiguration
If none are specified, "build, coverageTag, runConfiguration" is used by default. You are encouraged to use the defaults unless you have identified a specific need to optimize concurrency; be aware that changes come with risks.
It is strongly recommended that you have Parasoft assist you before attempting to tune and optimize the processing time of Data Collector. The process is safe and simple and involves the following:
- Parasoft collects some Data Collector diagnostic information. This requires running a few queries, provided by Parasoft, to extract the data from the database. No confidential information is extracted, just timing information logged by Data Collector.
- Parasoft analyzes the collected information and makes the appropriate recommendations based on their assessment.
Parasoft does not recommend that customers try to tune and optimize Data Collector on their own without proper assessment and guidance.