In this section:
One of the first decisions that a developer has to make when assessing a violation is whether to fix the violation or suppress it. Although most violations detected may be true positives, there also may be instances in which suppressing a violation is the sensible thing to do, such as when the violation is a false positive or the team has decided to not fix certain kinds of violations.
DTP provides a recommendation whether to fix the violation based on historical data of whether similar violations have been fixed or suppressed. This recommendation is given as a fix percentage: a high percentage (for example, 80% or more) indicates that similar violations have been fixed in the past, while a low percentage (for example, less than 20%) means that similar violations have been suppressed in the past.
Developers can use the DTP recommendation as an assistant in assessing and triaging violations.
In order to get recommendations, a model must first be trained. Training requires violations that have been classified as Fix or Suppress in the Actions field of the Prioritization tab (see Assigning Actions to Violations).
If you have been fixing violations for some time, you can classify the violations based on history. This is the recommended approach. If you are a new user and have not fixed or suppressed any violations, you can classify violations manually. Both approaches are supported by the Machine Learning Wizard.
A balanced set of at least 20 violations must be classified as Fix and Suppress to train the model.
It is recommended to retrain the model from time to time as you continue to fix and suppress more violations.
The wizard guides you through the following process:
If Classify violations based on history is enabled, DTP will classify violations in the database that have been fixed or suppressed, resulting in one of the following outcomes:
A healthy recommendation model is essential for DTP to make reliable recommendations. If the classification process results in a Poor or Moderate model, then you should continue classifying violations until the model health improves. See About Model Health for additional information. If you used the Fast or Normal mode to train the model and are unsatisfied with the quality of the model, retrain the model and choose either the Deep or Deepest training mode option.
DTP uses the following algorithms to train the model:
The Fast and Deep training mode options use the k-NN and Naive Bayes algorithms, which have shown to be the fastest algorithms for processing static analysis data. But because only two algorithms run, DTP will also have fewer options when determining the best training model.
The Normal and Deepest training modes use all algorithms, which will take longer than the other options to process the data. Using all algorithms, however, provides DTP with the most information for determining the best training model.
The Fast and Normal training mode options limit the number of violations used to train the model to 1000. This enables DTP to train faster, especially if you have a large set of classified violations. Reducing the data sample size, however, may affect the overall quality of the model.
The Deep and Deepest training use all classified violations available to train the model. As a result, these options have the potential to produce the best possible model. Allow several more minutes for the training process to complete, however, if your project contains a very large set of classified violations.
You can get recommendations on violations to fix from two places:
You can also get DTP to recommend violations to fix by clicking Get Recommendations on the Prioritization tab.
If you have trained the model, the recommendation will be added to the Recommendations section. If one or more prerequisite conditions have not been met, there will be an info icon to the left of the Recommendations label that you can hover over for details.
You can view the violations to fix recommendations in two places: in the search results table and on the Prioritization tab.
In the search results table, look in the Recommended Action column. You can sort violations according to DTP's recommendations.
In the Prioritization tab, look in the Recommendations section.
The health of the model that DTP uses depends on the number of violations you process and the quality of the data you provide. DTP requires at least 20 violations to be classified, but classifying more violations will improve the model health. If you have not provided enough data, DTP will not be able to use the model to make a recommendation.
The model informing DTP recommendations also depends on the accuracy and consistency of the data you provide. DTP uses advanced algorithms to analyze the data associated with each violation, but it can only make quality recommendations based on quality inputs. If there is an imbalance in the number of fixed violations versus the number of suppressed violations (or imbalance in the number of violations with action set to Fix versus the number of violations with action set to Suppress), DTP may indicate that it cannot properly train a model. In this case, you should assign an equal number Fix and Suppress actions to facilitate model health.
You can help Parasoft improve the system by using the API to get Machine Learning data and sending the data to your Parasoft representative. Sensitive information, such as username and project name, is obfuscated in the response. Save the output from the following API endpoints are available to retrieve the Machine Learning data:
This endpoint returns static analysis violations used to train the model and recommendation results.
Authentication
Pass your username and password as when sending the request. See Example.
URL
http:<HOST>:<PORT>/grs/api/v1.7/ml/staticAnalysis/classificationResultsSets
Method
GET
Parameters
The following table describes the parameters available for this endpoint.
Parameter | Description | Type | Required |
---|---|---|---|
sortOrder | Specifies the order in which the data is returned by ID. Data is assigned IDs in sequential order, so newer data has a larger ID number. You can specify the following values:
| string | optional |
limit | Specifies the maximum number of result sets to return. The default is no limit. | integer | optional |
| Specifies the number of result sets to skip in the response. Default is If the database has three result sets and the | integer | optional |
curl -X GET -u username:password "http://dtp.mycompany.com:8443/grs/api/v1.7/ml/staticAnalysis/classificationResultSets?sortOrder=desc&limit=10" > myResultSets.json |
This endpoint returns actions assigned to the violations used to train the model.
URL
http:<HOST>:<PORT>/grs/api/v1.7/ml/staticAnalysis/violationActionHistory
Method
GET
Parameters
None.
Example
curl -X GET -u username:password "http://dtp.mycompany.com:8443/grs/api/v1.7/ml/staticAnalysis/violationActionHistory" > myViolationActionHistory.json |