Metrics
VIANOPS supports the following metrics:
- Performance metrics - classification models
- Performance metrics - regression models
- Drift metrics
- Data profiling metrics
- Custom metrics
All metrics are supported through the API. A subset of metrics are supported through the UI.
Performance metrics - classification models
VIANOPS supports the following performance metrics for classification models. If accessing in the UI (e.g., when creating a performance drift policy or visualizing model performance in the Model Dashboard), see more details for each metric.
For more on using these with an API, see the API documentation for performance drift policies. When calling the endpoint make sure to specify the API Value exactly as shown.
Metric | Description | In UI | In API | API Value |
---|---|---|---|---|
Accuracy | Fraction of correct predictions. | Yes | Yes | accuracy |
Area under the curve (AUC) | The area under the ROC curve, which quantifies the overall performance of a binary classifier compared to a random classifier | Yes | Yes | rocauc |
Balanced accuracy | Average of recall obtained on each class. | Yes | Yes | balanced_accuracy |
Bookmaker informedness (BM) | Sum of sensitivity and specificity minus 1. | Yes | bm |
|
Diagnostic Odds Ratio (DOR) | Ratio of the odds of the test being positive if the subject has a disease relative to the odds of the test being positive if the subject does not have the disease. | Yes | dor |
|
F1 Score (F1) | Harmonic mean of precision and recall. | Yes | Yes | f1 |
False Discovery Rate (FDR) | Fraction of incorrect predictions in the predicted positive instances. | Yes | fdr |
|
Fowlkes-Mallows index (FM) | Geometric mean of precision and recall. | Yes | fm |
|
False Negatives (FN) | Incorrectly predicted negatives. | Yes | Yes | fn |
False Negative Rate (FNR) | Fraction of positives incorrectly identified as negative. | Yes | fnr |
|
False Omission Rate (FOR) | Fraction of incorrect predictions in the predicted negative instances. | Yes | for |
|
False Positives (FP) | Incorrectly predicted positives. | Yes | Yes | fp |
False Positive Rate (FPR) | Fraction of negatives incorrectly identified as positive. | Yes | fpr |
|
Gini coefficient | A measure of inequality or impurity in a set of values, often used in decision trees. | Yes | Yes | modelgini |
Lift | A measure of the effectiveness of a predictive model calculated as the ratio between the results obtained with and without the predictive model. | Yes | Yes | lift |
Log loss | A performance metric for classification where the model prediction is a probability value between 0 and 1. Measures divergence of model output probability from actual value. Logloss = 0 is a perfect classifier. | Yes | Yes | logloss |
Matthews Correlation Coefficient (MCC) | (TPTN - FPFN)/(sqrt((TP + FP)(TP + FN)(TN + FP)(TN + FN))). -1 to 1 range (1 is a perfect binary classifier, 0 is random, -1 is everything wrong). | Yes | mcc |
|
Markedness (MK) | Sum of positive and negative predictive values minus 1. | Yes | mk |
|
Negative Likelihood Rate (NLR) | Ratio of false negative rate to true negative rate. | Yes | nlr |
|
Negative Predictive Value (NPV) | Fraction of identified negatives that are correct. | Yes | npv |
|
Positive Likelihood Ratio (PLR) | Ratio of true positive rate to false positive rate. | Yes | plr |
|
Precision | Fraction of identified positives that are correct. | Yes | precision |
|
Probability calibration curve | A plot that compares the predicted probabilities of a model to the actual outcome frequencies, used to understand if a model’s probabilities can be taken at face value. | Yes | prob_calib |
|
Prevalence Threshold (PT) | Point where the positive predictive value equals the negative predictive value. | Yes | pt |
|
Recall | Sensitivity, True Positive Rate. Fraction of positives correctly identified. (Higher Recall minimizes False Negatives) | Yes | Yes | recall |
Rate of Negative Predictions (RNP) | This term is not standard in machine learning. It could refer to a combined metric of recall and negative predictive value. | Yes | Yes | rnp |
Receiver Operating Characteristics (ROC) | A plot that illustrates the diagnostic ability (TPR v FPR) of a binary classifier as its decision threshold is varied. | Yes | auc_score |
|
Specificity | Fraction of negatives correctly identified. | Yes | specificity |
|
True Negatives (TN) | Correctly predicted negatives. | Yes | Yes | tn |
True Positives (TP) | Correctly predicted positives. | Yes | Yes | tp |
Threat score (TS) | TP/(TP + FN + FP) Also known as the Critical Success Index. | Yes | ts |
Additional classification metrics available from the API
Several metrics of classification models are calculated using averaging techniques and accessible via the API. (See the API documentation for performance drift policies.)
Metric | Description | API Value |
---|---|---|
F1, Micro-Averaged | Global average F-1 using total TP/TN/FP/FN. Better for imbalanced data. | micro[f1] |
F1, Macro-Averaged | Average of per-class F-1 scores without regard to class size. | macro[f1] |
F1, Weighted-Averaged | Average of per class F-1 scores weighted by support per class. | weighted[f1] |
Precision, Micro-Averaged | Global average Precision using total TP/TN/FP/FN. Better for imbalanced data. Equal to F-1/Precision/Accuracy. | micro[precision] |
Precision, Macro-Averaged | Average of per-class Precision scores without regard to class size. | macro[precision] |
Precision, Weighted-Averaged | Average of per class Precision scores weighted by support per class. | weighted[precision] |
Recall, Micro-Averaged | Global average Recall using total TP/TN/FP/FN. Better for imbalanced data. Equal to F-1/Precision/Accuracy. | micro[recall] |
Recall, Macro-Averaged | Average of per-class Recall scores without regard to class size. | macro[recall] |
Recall, Weighted-Averaged | Average of per class Recall scores weighted by support per class. | weighted[recall] |
Performance metrics - regression models
VIANOPS supports the following performance metrics for regression models. If accessing in the UI (e.g., when creating a performance drift policy or visualizing model performance in the Model Dashboard), see more details for each metric.
For more on using these with an API, see the API documentation for API - Performance drift. Value must be typed in the API exactly as shown.
Metric | Description | In UI | In API | API Value |
---|---|---|---|---|
Mean Absolute Error (MAE) | Average of absolute prediction errors. (Error = predicted value - actual value in all cases) | Yes | Yes | mae |
Mean Absolute Percentage Error (MAPE) | Average of absolute percentage prediction errors abs((pred value - actual)/actual). | Yes | Yes | mape |
Mean Squared Error (MSE) | Average of squared prediction errors. | Yes | Yes | mse |
Negative Mean Squared Error (NMSE) | Negative of MSE. | Yes | Yes | mse |
Negative Root Mean Squared Error (NRMSE) | Negative of RMSE. | Yes | Yes | rmse |
Negative Mean Absolute Error (NMAE) | Negative of MAE. | Yes | Yes | mae |
Negative Mean Absolute Percentage Error (NMAPE) | Negative of MAPE. | Yes | Yes | mape |
R-Squared (R2) | The proportion of the variance in the dependent variable that is predictable from the independent variable(s). | Yes | Yes | r2_score |
Root Mean Squared Error (RMSE) | Square root of MSE. | Yes | Yes | rmse |
Drift metrics
VIANOPS supports the following drift metrics.
For more on using these with an API, see API - Performance drift.
Metric | Description | In UI | In API |
---|---|---|---|
Jensen-Shannon Divergence for prediction drift | Square root of the J-S divergence, which measures the average divergence of baseline and target from the mean of the two distributions. | Yes | Yes |
Population Stability Index (PSI) for prediction drift | Sum of (baseline frequency - target frequency) x log (baseline/target) across all defined bins. | Yes | Yes |
Jensen-Shannon Divergence for feature drift | Square root of the J-S divergence, which measures the average divergence of baseline and target from the mean of the two distributions | Yes | Yes |
Population Stability Index (PSI) for feature drift | Sum of (baseline frequency - target frequency) x log (baseline/target) across all defined bins. | Yes | Yes |
Data profiling metrics
VIANOPS supports Numerical and Categorical data profiling metrics.
Numerical data profiling metrics
Metric | Description | In UI | In API |
---|---|---|---|
Count | Total number of observations or records. | Yes | Yes |
Min/Mean/Max | The smallest, average, and largest values respectively. | Yes | Yes |
1%/99% | The 1st and 99th percentiles of the data respectively (values below which a certain percent of observations fall). | Yes | Yes |
5%/95% | The 5th and 95th percentiles of the data respectively. | Yes | Yes |
10%/90% | The 10th and 90th percentiles of the data respectively. | Yes | Yes |
25%/50%/75% | The 25th (1st quartile or Q1), 50th (median or Q2), and 75th (3rd quartile or Q3) percentiles of the data respectively. | Yes | Yes |
Mean-Std/Mean/Mean+Std | The mean value minus one standard deviation, the mean value, and the mean value plus one standard deviation respectively. This provides a sense of the spread of the data around the mean. | Yes | Yes |
Categorical data profiling metrics
VIANOPS supports the following categorical metrics.
Metric | Description | In UI | In API |
---|---|---|---|
Count | Total count. | Yes | Yes |
Unique values | Number of unique values. | Yes | Yes |
Value counts | Dictionary of {value: value_count}. | Yes | Yes |
Custom metrics
You can define custom metrics based on the standard performance metrics listed above, for both regression and classification models.
- Define the custom metric with either the /v1/model-metrics REST API or the vianops_client.models.riskstore.model-metrics SDK endpoint. (To access REST API docs, see APIs.)
- Define the metric with a python function or a py_statement.
- Define your custom metric based on any of the VIANOPS standard metrics or datasources:
- Standard metrics
- To view datasources you can use to create custom metrics, use the following APIs:
- REST API — /v1/performance/model-metrics.
- SDK endpoint — vianops_client.models.riskstore.performance.model-metrics SDK endpoint.
- Custom metrics appear in italics in all the UI elements where standard metrics appear.
-
The following is an example of the body for a model-metrics endpoint defining a custom metric with language set to
py_statement
:[ { "metric_name": "relative_squared_error3", "description": "Compare RMSE to MAE to determine the distribution of errors.", "language": "py_statement", "metric_type": "model_performance", "experiment_type": "regression", "definition": "rmse / mae", "abbreviation": "RSE", "alt_names": ["rmm"], "full_name": "Relative Squared Error", "status": "active", "metric_category": "custom", "metric_tags": { "for_display": true, "for_custom_definition": true, "is_metric_of_interest": true, "is_percent": false, "lower_is_better": false, "needs_predict_proba": false, "needs_class_of_interest": false } } ]
-
The following is an example of notebook code to define a custom metric with language set to
python
:load_api = ModelMetricsV1API() model = V1ModelMetricsModel( metric_name=custom_metrics[2], description="topXauc refers to the area under the ROC curve when only the top 'X' predicted probabilities are considered for evaluation.", full_name="topXAUC", abbreviation="topX AUC", language="python", metric_type="model_performance", experiment_type="binary_classification", definition = { "method": "calculateTopXauc", "init_params": {"filtered_df": "drifter_df", "top_x": 10}, "module": "custom", "classname": "" }, ) models=V1ModelMetricsModelList(__root__=[]) models.__root__.append(model) create_res = load_api.create(models) print(create_res)
This example assumes the following:
custom_metrics
has been defined previously as a list of at least three items representing the name for each custom metric. In this example, the third item in the list is the name of this custom metric.definition.method
callscalculateTopXauc
, which is defined in the file/source/custom.py
.module.custom
points to the file/source/custom.py
.
- Add the following tags to the
model_tags_payload
section of your notebook.
-
custom_metrics
{"name": "custom_metrics", "value": custom_metrics, "status": "active"}
This object assumes you have assigned the variable
custom_metrics
previously in the notebook, as shown in the example below. This example assumes you have created custom metrics named “custom_reg_metric” and “twice_error” with the custom metric endpoint. -
class_of_interest
{"name": "class_of_interest", "value": custom_metrics_class, "status": "active"}
This object assumes you have assigned the variable
custom_metrics_class
previously in the notebook as shown in the example below. You only need to provide this tag for binary classification models. For binary classification models, the classes of interest can be 0 or 1, and if neither then you must assign to “None”.