Create a policy
In VIANOPS, policies can be created using the SDK or REST API, or directly through the user interface. Policies are tied to a specific model and run against the inference data. Therefore, it is important to ensure that the model and inference data have been properly ingested into VIANOPS before creating a policy.
Add a drift policy
-
Open VIANOPS to the Model Dashboard for the model you want to add the policy.
- You can either add a new drift policy or duplicate an existing policy and edit the values.
- Add a new policy: In the Policy List panel, click New Policy.
- Duplicate a policy: In the Policy List panel, find the policy you want to duplicate and click the three dots at the far right and choose Duplicate Policy.
-
If you are creating a new policy, you are asked to select Performance, Feature Drift, or Prediction Drift and continue with the matching section below. If you have duplicated a policy, you are taken directly to the page that matches the drift type of the original policy.
Performance drift
-
In Policy Name, enter a name for the policy.
-
Optionally, enter a description in Description.
-
Select a Target Window.
This is the window of data (inference or prediction) to compare against the baseline data.
- Daily — All data since midnight.
- Week-to-date — All data since the start of the week. Default is Monday.
- Month-to-date — All data since midnight of the first day of the month.
- Quarter-to-date — All data since the start of the current quarter. The default start is January 1, April 1, July 1, or October 1.
For example, if you select Week-to-date, and the day of the scheduled event (or today, if you run manually) is Thursday, 4/20/2023, then the set of data spans from midnight on Monday, 4/17/2022, to time of the scheduled event (or manual run) on Thursday, 4/20/2023.
-
Select a Baseline Window.
The options for baseline depend on what you select as the target window. The platform does not allow you to choose windows that overlap, such as choosing Week-to-date for the target window and Day prior for the baseline window.
Target Window Baseline Window Options Daily Training/test data — Uses the training data as the baseline.
Day prior — Uses data from the previous day as the baseline.
Last X same weekdays — Uses aggregate data from a specified number of the same day of the week. The platform treats the last X weekdays as one dataset when computing statistics.Week-to-date Training/test data — Uses the training data as the baseline.
Week prior — Uses data from the most recent full week (Monday midnight to Sunday 23:59:59) as the baseline.
Last X weeks — Uses aggregate data from the specified number of previous weeks. The platform treats the last X weeks as one dataset when computing statistics.Month-to-date Training/test data — Uses the training data as the baseline.
Month prior — Uses data from the most recent full month as the baseline.
Same month prior year — Uses data from the same month of the previous year.
Last X months — Uses aggregate data from the specified number of previous months. The platform treats the last X months as one dataset when computing statistics.Quarter-to-date Training/test data — Uses the training data as the baseline.
Quarter prior — Uses data from the most recent full quarter as the baseline.
Same quarter prior year — Uses data from the same quarter of the previous year.Any Training/Test data — Uses data from a training dataset you have already uploaded. See Upload training data -
In the Segment Selection section, select the segments you want to add to the policy. The policy automatically analyzes all the data in addition to any segment you select.
-
In the Hotspot by features section, select any of the features you want to have hotspot analysis run on. Only the categorical features are available for hotspot analysis.
-
Select a performance metric in the Metrics pulldown. Options are different for classification models and regression models:
Note: If you have created custom metrics, they appear in italics. Standard metrics appear in regular font.
-
Classification
-
Accuracy: Measures the percentage of correctly classified instances out of all the instances in the dataset.
Accuracy = (Number of correctly classified instances)/(Total number of instances)
If the dataset is imbalanced, other metrics like precision, recall, and F1 score might be better metrics.
-
Balanced accuracy (default): Measures the average performance of the model in correctly identifying both positive and negative instances.
Balanced accuracy calculates the average of sensitivity and specificity. Sensitivity is the proportion of true positive predictions out of all actual positive instances, while specificity is the proportion of true negative predictions out of all actual negative instances. The balanced accuracy ranges from 0 to 1, with higher values indicating better performance.
This is a useful metric when data is imbalanced and the cost of false positives and false negatives is similar. If the cost of false positives and false negatives is significantly different, then precision, recall, F1 score, and AUC might be better metrics.
-
Precision: Measures the proportion of true positive predictions out of all positive predictions. It indicates the model’s ability to avoid false positives.
Precision = (Number of true positive predictions) / (Number of true positive predictions + Number of false positive predictions)
If the cost of false negatives is high, then metrics like recall and F1 score might be better metrics because precision does not account for instances that are falsely predicted as negative.
-
Recall: Measures the proportion of true positive predictions out of all actual positive instances in the dataset.
Recall = (Number of true positive predictions) / (Number of true positive predictions + Number of false negative predictions)
If the cost of false positives is high, precision or F1 score might be better metrics because recall does not account for instances that are falsely predicted as positive.
-
F1 score: Evaluates the overall performance of a model in terms of precision and recall. It is the harmonic mean of precision and recall, and it ranges from 0 to 1 with higher values indicating better performance.
F1 score = 2 * (precision * recall) / (precision + recall)
This metric is useful when the data is imbalanced and the cost of false positives and false negatives is high. If precision and recall have relative weights, precision and recall might be better metrics because F1 score does not handle relative weights well.
-
Area Under the ROC Curve (AUC): Measures a model’s ability to distinguish between positive and negative instances regardless of the threshold used to make the classification decision.
The AUC plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at various threshold values and calculates the area under the curve. The TPR is the proportion of true positive predictions out of all actual positive instances, and the FPR is the proportion of false positive predictions out of all actual negative instances. The AUC ranges from 0 to 1, with higher values indicating better performance.
For example, an AUC of 0.8 means that the model can distinguish between positive and negative instances 80% of the time, regardless of the threshold used to make the classification decision.
If the cost of false positives and false negatives are significantly different, precision, recall, or F1 score might be better metrics.
-
-
Regression
-
Mean Absolute Error (MAE): Measures the average absolute difference between the predicted and actual values.
The MAE is calculated by taking the average of the absolute differences between the predicted and actual values. The MAE ranges from 0 to infinity, with lower values indicating better performance.
This is a good metric when the data contains outliers; however, because it treats all errors equally and does not distinguish between overestimation and underestimation errors, RMSE might be a better metric under such conditions.
-
Mean Absolute Percentage (MSPE): Measures the average squared difference between the predicted and actual values, normalized by the actual value.
The MSPE is calculated by taking the average of the squared percentage differences between the predicted and actual values. The MSPE ranges from 0 to infinity, with lower values indicating better performance.
This metric is useful when the data contains large outliers or the values have large differences; however, because it can miss small changes in actual values, MAE or RMSE might be better metrics under such conditions.
-
Mean Squared Error (MSE): Measures the average squared difference between the predicted and actual values.
The MSE is calculated by taking the average of the squared differences between the predicted and actual values. The MSE ranges from 0 to infinity, with lower values indicating better performance.
This metric can be sensitive to outliers and may not be easily interpretable due to the squared units, MAE or RMSE might be better metrics under such conditions.
-
Root Mean Squared Error (RMSE): Measures the square root of the average squared difference between the predicted and actual values.
The RMSE is calculated by taking the square root of the average of the squared differences between the predicted and actual values. The RMSE ranges from 0 to infinity, with lower values indicating better performance.
This metric is easily interpretable because it is measured in the same units as the predicted and actual values; however, because it can be sensitive to outliers, MAE might be a better metric under such conditions.
-
R-squared (R2): The coefficient of determination which measures the goodness of fit of a model. For regression models, this is a statistical measure of how well the regression line approximates the actual data. R2 is calculated at scale and on segments.
-
-
- Set the warning level % and critical level % under Threshold.
- The platform applies these numbers to each feature to determine whether it has drifted past warning or critical levels. For feature drift only, these numbers apply to the cumulative weighted drift (weights selected times the individual feature drift) to determine if the entire selected feature set has drifted beyond warning or critical levels.
- These thresholds, and results that are in those ranges, appear in graphs in different colors so you can see quickly when any feature drifts across a threshold.
-
Click Next.
-
In the Schedule window, performance policies are updated as ground truth arrives, so scheduling is not available. Click Next.
-
Confirm that the policy information matches what you want.
-
Click Save Policy.
- Jump to Activate or deactivate drift policies to continue.
Feature drift
-
Click Distance-based and then click Next.
-
Enter a policy name in the Policy Name box.
-
Optionally, enter a description in the Description box.
-
Select a Target Window.
This is the window of data (inference or prediction) to compare against the baseline data.
- Daily — All data since midnight.
- Week-to-date — All data since the start of the week. Default is Monday.
- Month-to-date — All data since midnight of the first day of the month.
- Quarter-to-date — All data since the start of the current quarter. The default start is January 1, April 1, July 1, or October 1.
For example, if you select Week-to-date, and the day of the scheduled event (or today, if you run manually) is Thursday, 4/20/2023, then the set of data spans from midnight on Monday, 4/17/2022, to time of the scheduled event (or manual run) on Thursday, 4/20/2023.
-
Select a Baseline Window.
The options for baseline depend on what you select as the target window. The platform does not allow you to choose windows that overlap, such as choosing Week-to-date for the target window and Day prior for the baseline window.
Target Window Baseline Window Options Daily Training/test data — Uses the training data as the baseline.
Day prior — Uses data from the previous day as the baseline.
Last X same weekdays — Uses aggregate data from a specified number of the same day of the week. The platform treats the last X weekdays as one dataset when computing statistics.Week-to-date Training/test data — Uses the training data as the baseline.
Week prior — Uses data from the most recent full week (Monday midnight to Sunday 23:59:59) as the baseline.
Last X weeks — Uses aggregate data from the specified number of previous weeks. The platform treats the last X weeks as one dataset when computing statistics.Month-to-date Training/test data — Uses the training data as the baseline.
Month prior — Uses data from the most recent full month as the baseline.
Same month prior year — Uses data from the same month of the previous year.
Last X months — Uses aggregate data from the specified number of previous months. The platform treats the last X months as one dataset when computing statistics.Quarter-to-date Training/test data — Uses the training data as the baseline.
Quarter prior — Uses data from the most recent full quarter as the baseline.
Same quarter prior year — Uses data from the same quarter of the previous year.Any Training/Test data — Uses data from a training dataset you have already uploaded. See Upload training data - Select the features to add to the drift policy.
- All features — adds all features to the policy.
- Custom selection — select the features from the Feature drift column of the scrollable window. Checking a box automatically selects the Custom selection option.
- Top X Important Features — select the number of top features to show.
-
Select the features you want to have hotspot analysis run on. Hotspot analysis is available only for categorical features, so only categorical features have their checkboxes enabled.
- Select how to weigh the features in the policy.
- Weigh by Feature Importance — applies weights as defined in the feature importance data. See Feature importance.
- Equally Weigh All Features — gives each feature the same weight.
- Manually Weigh Features — enables the Importance column in the scrollable table for you to select a weight for each feature.
Note: The weights are percentages and must add up to 100.
-
In the Segment Selection section, select the segments you want to add to the policy. The policy automatically analyzes all the data in addition to any segment you select.
-
Select the drift metric, either Population Stability Index (PSI) or JS Distance.
-
Under Thresholds, enter numbers for Warning level and Critical level. - The platform applies these numbers to each feature to determine whether it has drifted past warning or critical levels. For feature drift only, these numbers apply to the cumulative weighted drift (weights selected times the individual feature drift) to determine if the entire selected feature set has drifted beyond warning or critical levels. - These thresholds, and results that are in those ranges, appear in graphs in different colors so you can see quickly when any feature drifts across a threshold.
-
Click Next.
-
Set up a schedule to calculate drift by choosing to run daily, weekly, or monthly, and then setting a time to run at that chosen scan interval.
-
Confirm that the policy information matches what you want.
-
Click Save Policy.
- Jump to Activate or deactivate drift policies to continue.
Prediction drift
-
Click Distance-based and then click Next.
-
Enter a policy name in the Policy Name box.
-
Optionally, enter a description in the Description box.
-
Select a Target Window.
This is the window of data (inference or prediction) to compare against the baseline data.
- Daily — All data since midnight.
- Week-to-date — All data since the start of the week. Default is Monday.
- Month-to-date — All data since midnight of the first day of the month.
- Quarter-to-date — All data since the start of the current quarter. The default start is January 1, April 1, July 1, or October 1.
For example, if you select Week-to-date, and the day of the scheduled event (or today, if you run manually) is Thursday, 4/20/2023, then the set of data spans from midnight on Monday, 4/17/2022, to time of the scheduled event (or manual run) on Thursday, 4/20/2023.
-
Select a Baseline Window.
The options for baseline depend on what you select as the target window. The platform does not allow you to choose windows that overlap, such as choosing Week-to-date for the target window and Day prior for the baseline window.
Target Window Baseline Window Options Daily Training/test data — Uses the training data as the baseline.
Day prior — Uses data from the previous day as the baseline.
Last X same weekdays — Uses aggregate data from a specified number of the same day of the week. The platform treats the last X weekdays as one dataset when computing statistics.Week-to-date Training/test data — Uses the training data as the baseline.
Week prior — Uses data from the most recent full week (Monday midnight to Sunday 23:59:59) as the baseline.
Last X weeks — Uses aggregate data from the specified number of previous weeks. The platform treats the last X weeks as one dataset when computing statistics.Month-to-date Training/test data — Uses the training data as the baseline.
Month prior — Uses data from the most recent full month as the baseline.
Same month prior year — Uses data from the same month of the previous year.
Last X months — Uses aggregate data from the specified number of previous months. The platform treats the last X months as one dataset when computing statistics.Quarter-to-date Training/test data — Uses the training data as the baseline.
Quarter prior — Uses data from the most recent full quarter as the baseline.
Same quarter prior year — Uses data from the same quarter of the previous year.Any Training/Test data — Uses data from a training dataset you have already uploaded. See Upload training data -
In the Segment Selection section, select the segments you want to add to the policy. The policy automatically analyzes all the data in addition to any segment you select.
-
In the Hotspot by features section, select any of the features you want to have hotspot analysis run on. Only the categorical features are available for hotspot analysis.
-
Select the drift metric, either Population Stability Index (PSI) or JS Distance.
-
Under Threshold, enter numbers for Warning level and Critical level. - The platform applies these numbers to each feature to determine whether it has drifted past warning or critical levels. For feature drift only, these numbers apply to the cumulative weighted drift (weights selected times the individual feature drift) to determine if the entire selected feature set has drifted beyond warning or critical levels. - These thresholds, and results that are in those ranges, appear in graphs in different colors so you can see quickly when any feature drifts across a threshold.
-
(Regression models only) If you want to add a custom bin distribution, choose Custom bin distribution and enter a list of histogram bin edges to the text box. For example: [-10, 50, 250, 700] defines three bins: -10 to 50, 50 to 250, and 250 to 700. Any data that fall outside these bins are grouped in the nearest bin. For example, -11 is grouped in the -10 to 50 bin, and 775 is grouped in the 250-700 bin. By default, VIANOPS uses baseline decile binning.
If you do not add a custom bin distribution, VIANOPS uses its default distribution: baseline decile binning, because this has the best tradeoff between speed for large amounts of data and granularity to uncover meaningful amounts of drift between baseline and target distributions (you won’t get flooded with drift alerts).
-
Click Next.
-
Set up a schedule to calculate drift by choosing to run daily, weekly, or monthly, and then setting a time to run at that chosen scan interval.
-
Confirm that the policy information matches what you want.
-
Click Save Policy.
-
Jump to Activate or deactivate drift policies to continue.
Activate or deactivate drift policies
After you create a drift policy, you must activate it.
-
In the model dashboard Policy List pane, find your new policy and clicking the three dots at the far right of the row, choose Activate Policy.
-
To deactivate a policy, use the same controls but choose Deactivate Policy from the pulldown.