Advanced monitoring
In the Monitor your model tutorial, you learn how the configuration of the monitor_placeholder_api_generic.ipynb
notebook can create the sample taxi fare model for more details.). That tutorial walks through the out-of-the-box configuration and settings defined for the notebook to create the sample model.
In this advanced tutorial, you learn how to extend the sample notebook with advanced configurations, specifically how to:
- Set a different target for the feature set
- Change the feature set for the model
- Create a custom inference mapping (optional)
- Modify a policy to detect issues on different columns
- Create a new segment for a policy
- Create a new policy
helper.py
included with the sample notebook provides numerous client functions to make the notebook workflow easier to process and follow. All SDK classes are imported there to support SDK client functionality.Set a different target for the feature set
The target for the feature set is one of the input parameters for the placeholder model notebook. It must be a column in the feature set. To set a different target for your feature set, add the name of that column as the value for targetcolumn
. Make sure this column name (string format) matches a column in your feature set. When you run this cell, the notebook picks up your custom settings.
Change the feature set for the model
When you specify a different feature set for the placeholder model it likely has different columns. You need to modify the features and columns set for the model to match those provided by the feature set. These values are set in the placeholder model notebook variables. Modify the following notebook variables as explained:
allcolumns
—Names of all columns (including the target column) in the feature set, defined as an array (comma-separated and enclosed in quotes).continuouscolumns
—Names of all columns in the feature set that contain continuous data, defined as an array (comma-separated and enclosed in quotes). If there are no columns with continuous data, leave empty square brackets [].categoricalcolumns
—Names of all columns in the feature set that contain categorical data, defined as an array (comma-separated and enclosed in quotes). If there are no columns with categorical data, leave empty square brackets [].str_categorical_cols
—Names of columns in the feature set containing categorical data of string datatype. Use empty brackets [] if there are no categorical columns.targetcolumn
—Name of the column containing the target for the feature set, i.e., the feature that the model is predicting.offset_col
—Name of the feature set column containing the offset value (datetime format), used for generating time/date stamps identifying when predictions were made. Offsets the given datetime to today minus n number of days.identifier_col
—Name of the column containing the identifier feature that uniquely identifies predictions. (When the model sends predictions, the platform uses the identifier column to map the model’s ground truth data.)
In addition, as an option you can create a custom inference mapping that matches the columns and data. See Create a custom inference mapping (optional) for details.
Make sure all policies and segments configured in the placeholder model notebook also reflect the modified features and columns as needed.
Create a custom inference mapping (optional)
The sample notebook creates an inference mapping payload using the values defined when setting up notebook variables (cells 2 and 3 in the notebook). The parameters for the inference mapping are defined in the helper file method, inference_mapping_generator
. Then, the helper file creates a payload
using the values specified for categorical_columns
, str_categorical_cols
, continuous_columns
, and offset_col
to determine the columns[] schema values: name, dtype, sql_type, feature_type. See the V1InferenceMappingModel documentation.
When extending the placeholder model notebook to use your own data (i.e., local CSV or Parquet files) you can create your own custom schema to match your model’s data. With the custom inference mapping in place, the platform can understand and process inference data sent by the placeholder model deployment. In this topic, you learn how to create a new notebook cell with your custom inference mapping.
Local data filenames must contain only lowercase letters (a-z) or numbers (0-9), or the characters underscore (_) or period (.).
To create a new custom inference mapping
- Navigate to the notebook cell, “Load inference mapping schema”. This cell contains the default inference mapping for the placeholder model notebook.
- Press (+) to create a new empty cell.
- Create an instance of the class V1InferenceMappingJob with the values for your model’s schema. You can use the helper method
inference_mapping_generator
when creating your own schema.- Create an instance of the class V1InferenceMappingSchema to specify the
df_schema
values (datetime_col, target_col, columns[], and identifier_cols[]). - Create an instance of the class V1InferenceMappingColumnSchema to specify the actual columns for the schema.
See the Inference Mapping API documentation for more information about the values needed to create the Inference Mapping object. Specifying the identifier and datetime columns is recommended.
- Create an instance of the class V1InferenceMappingSchema to specify the
- Make sure to update the columns and values defined in cell 3 of the notebook to match your new inference mapping. Also, make sure to create policies and segments that use your data, or modify the current policies and segments to match the new inference mapping, as needed.
- When done, make sure to save your notebook.
- If you’ve already run the sample placeholder model notebook cells to this point, you can run this cell to create the inference mapping in the platform. (You must be logged in to the platform by running the cell 2 of the notebook.) Otherwise, you need to run the notebook from the top to get everything set up correctly.
Modify inference tracking to support custom inference mapping
The sample monitor notebook uses the helper.py
method populate_data()
to send data to the backend. This method is tied to the sample notebook dataset “Taxi fare dataset”.
When using a custom inference mapping for a different dataset, you need to write your own code for populate_data()
to provide the structure for sending your data to the backend.
Additionally, when using a custom inference mapping you need to write code to support inference tracking for your dataset, so that inferences for your new dataset can be sent to the backend. To do this, you implement cache_upload
to upload the data to cache and inference_tracking
to send the uploaded data to the VIANOPS backend.
Modify a policy to detect issues on different columns
The sample placeholder model notebook is configured to detect drift by analyzing and comparing data provided by features in the sample dataset. Specifically, the “Month-to-month drift policy” analyzes data equally across five features:
"select_features_type": "custom",
"feature_weightage": "equal",
"feature_weights": {
"PULocation":20,
"DOLocation":20,
"est_trip_distance": 20,
"est_travel_time": 20,
"est_extra_cost": 20,
},
While both other drift policies—”Week-to-week drift w/segments” and “Day-to-day drift w/segments”—detect drift equally across three features:
"select_features_type": "custom",
"feature_weightage": "equal",
"feature_weights": {
"est_trip_distance": 33.33333,
"est_travel_time": 33.333333,
"est_extra_amount": 33.33333,
},
Using this example, if you wanted a policy to detect drift equally using fewer or more features from the sample dataset, modify the notebook as follows:
- Navigate to the cell containing a drift policy to modify, within the section “Segments and Policies”.
- Modify the
feature_weights
value to specify the features for drift detection. - Specify the same weight value for each feature so that when summed they equal 100.
- Make sure the features you specify for the policy are included in the dataset and identified as columns in the dataset in cell 3 and configured in any related segment as needed.
- Save the notebook and run the cells in order, i.e., create the policy, run the preprocessing job, and then run the policy.
As an alternative to running the modified policy from the notebook (in the “Run the policies” section), you can run it from the VIANOPS UI. In the Policy List page, select the policy. In the policy information page, select Actions > Run Policy. (See the Explore sample taxi fare model tutorial for more details.)
Note: To run API endpoints like these cells, you must be logged in to the platform by running cell 2 in the notebook.
Create a new segment for a policy
The sample placeholder model notebook explains how to create two segments and add them to policies. One segment, “Brooklyn-Manhattan Segment”, filters to look only at data where PULocation is either Downtown Brooklyn/MetroTech, DUMBO/Vinegar Hill, or Brooklyn Heights and DOLocation is either Little Italy/NoLiTa or Lower East Side. The other segment, “Williamsburg-Manhattan Segment”, filters to look only at data where PULocation is either Williamsburg (South Side) or South Williamsburg, and DOLocation is either Little Italy/NoLiTa or Lower East Side. (See the “Create Segment1” and “Create Segment2” cells for configuration.)
A new variable, segment_params
, is an array containing both segments. Using segment_params
, both segments are then added to the “Week-to-week drift w/segments” and “Day-to-day drift w/segments” policies via the key-value pair, "segments": segment_params
. Each time the policies run, they look for drift three times across the data defined for a policy (i.e., defined by “feature_weights”): once across all data, once across the data defined by Segment1, and once across the data defined by Segment2.
The other drift policy, “Month-to-month drift policy”, does not use segment filters. Also, segments are not supported for performance policies and so are not included in “MAE performance Policy”.
New segment for this tutorial
In this topic, you learn how to create a new segment and add it to the “Month-to-month drift Policy”. This segment filters to analyze only the data where est_travel_time
is less than 15.00 (minutes) or where est_trip_distance
is under 2.5 (miles) and est_extra_cost
is greater than 0 (cents).
We’re going to use the “Create Segment1” cell as a template and then modify for our needs. Our segment combines a simple filter and a grouped_filters[] to support the complex filter conditions.
When creating the filters, you use a combination of operators and conjunctions as explained in the client SDK documentation.
-
Under the section “Create and add segments to the policies”, press “+” to create a new cell after the “Create Segment2” cell.
-
Paste the following into the cell:
payload = { "name": f"{segment3_name}", "description": "SEGMENT DESCRIPTION", "filters": [ { "feature_name": "DATASET FEATURE", "operator": "OPERATOR" "value": [ "VALUE", "VALUE", "VALUE", ], "conjunction": "CONJUNCTION", }, { "feature_name": "DATASET FEATURE", "operator": "OPERATOR", "value": ["VALUE, VALUE, VALUE"], "conjunction": "CONJUNCTION", }, ], "status": "inactive", "model_uuid": "MODEL_UUID", }
-
Optionally, change description to “Trips under 2.5 miles or under 15 minutes”.
-
Replace the first “filters” object to filter on the feature
est_travel_time
using the operator, value, and conjunction as shown:{ "feature_name": "est_travel_time", "operator": "<", "value": [ 15.00 ], "conjunction": "null" },
-
After the first filter add an
OR
conjunction."conjunction": OR
-
Next, create
grouped_filters[]
to configure a complex nested filter that is joined to the first filter with theOR
conjunction. Create this complex filter for theest_trip_distance
andest_extra_cost
features using the operators, values, and conjunctions as shown:"grouped_filters": [ { "feature_name": "est_trip_distance", "operator": "<" "value": ["2.5"], "conjunction": "AND", }, { "feature_name": "est_extra_cost", "operator": ">", "value": ["0"], "conjunction": null }, ]
-
Finally, paste the following at the bottom of the cell:
segment3 = V1SegmentBaseModel(**payload) segment3_list = V1SegmentBaseModelList(__root__=[]) segment3_list.__root__.append(segment3) segment3_params = segments_api.create(segment3_list) print(segment3_params)
The “payload” contains the configuration for the new segment.
We’re passing the defined payload to
segment3
, which is a new instance of class V1SegmentBaseModel.Then,
segment3_list
(new instance of class V1SegmentBaseModelList) enables us to get a list of individual parameters for use by the policy. Usesegments_api.create(segment3_list)
to create the segment using the values of segment3. -
Navigate to cell 3 and add an entry for your new segment to “Policy-related artifacts” section:
segment3_name = f"Short trips"
-
Navigate to cell 4 and add an entry for your new segment to “feature set name and policy-related artifacts” section:
segment3_name = f"Short trips {count_user}"
Now, we can add the new segment to the policy.
-
Navigate to the cell for “Create a base month-to-month policy” and update the
segments
parameter in the policy payload:segment_params = [segment3_params[0]]
-
When done, make sure to save your notebook.
-
Starting at cell 2, run the notebook again to apply the new variables (cells 2 and 3), create the new segment, and run the preprocessing job for that segment.
Create a new policy
The sample notebook includes four policies: three that detect feature drift, and one that detects issues with model performance. Two of the policies include segmentation, enabling them to run drift detection on a focused set of data as well as across all data defined for the policies. You can create new policies to run as part of this sample notebook.
New policy for this tutorial
In this topic, you learn how to add a new policy that detects distance-based drift on prediction data. Configuration for the policy specifies how it operates, the data it analyzes, the schedule it runs on, and the thresholds that indicate alert conditions. The prediction policy is running the configured metric against the prediction within a defined target window and against a defined baseline to determine when prediction drift exceeds thresholds. Our policy looks for drift in predictions generated during the current week as compared with a baseline of predictions generated two weeks ago.
For more information, see documentation for the prediction drift models and supported parameters.
- Navigate to the “Create model performance policy” cell (under the section “Segments and Policies”). This cell contains the configuration for the model performance policy.
- Press (+) to create a new empty cell after the model performance policy cell.
- Paste the following payload “template”. We go through this payload and the steps for modifying.
payload = [
{
"deployment": f"{deployment}",
"model_name": f"{deployment}",
"model_version": f"{model_version}",
"model_stage": f"{model_stage}",
"name": f"{policy3_name}",
"description": "DESCRIPTION",
"type": "drift",
"policy": {
"type": "prediction-drift",
"drift_type": "distance",
"window_parameters": {
"target": {
"window_type": "WINDOW_TYPE"
},
"baseline": {
"window_method": "WINDOW_METHOD",
"window_type": "WINDOW_TYPE",
"last_amount: NUMBER
}
},
"drift_measure": "MEASURE",
"warning_level": 0.1,
"critical_level": 0.15,
"schedule": "0 0 5 ? * *",
"method": "preprocess",
"baseline_bins": {
"total_amount": [NUMBERS]
},
}
},
]
-
In the pasted payload template, leave the default settings for:
"deployment": f"{deployment}", "model_name": f"{deployment}", "model_version": f"{model_version}", "model_stage": f"{model_stage}",
-
Optionally, create a policy description as “Detect drift in total_amount between current week and prior week.”
-
In
policy{}
, leavetype
anddrift_type
set at the template values. -
Under
window_parameters
, set targetwindow_type
toweek
. -
For baseline, set
window_method
toprior
,window_type
toweek
, andlast_amount
to2
. -
Set
drift_measure
toPSI
so the policy uses the Population Stability Index to measure drift. -
Leave warning level and critical level set at the template values. These are default values and ensure warning alerts are signaled when the drift measure exceeds
.1
, and critical alerts are signaled when the drift measure exceeds.15
. -
Leave the schedule set to
"schedule": "0 0 8 ? * *"
to ensure the policy runs daily at 8am.Note: The platform uses Coordinated Universal Time (UTC) format. If your local time or cloud node offset is different from UTC time, it is highly recommended that you create timestamps in UTC format. -
Leave method at
preprocess
. This specifies the method for saving policy data. -
Set custom baseline bins (i.e., start and end bin edges) to better view and understand prediction drift results. For our purposes, we are setting three bins for
total_amount
: 0 to 25, 26 to 50, and 51 to the end of the data:"baseline_bins": { "total_amount": [0, 25, 26, 50, 51] },
-
Finally, paste the following at the bottom of the cell:
policy3 = V1PolicyRequestModel(**payload[0]) policy3_data = V1PolicyRequestModelList(__root__=[]) policy3_data.__root__.append(policy3) policy3_res = policies_api.create(policy3_data) print(policy3_res)
“payload” contains the configuration for the new policy.
We’re passing the defined payload to policy3, which is a new instance of class V1PolicyRequestModel. Then, policy3_data (a new instance of class V1PolicyRequestModelList) enables us to get a list of individual parameters for use by the policy. Use
policies_api.create(policy3_data)
to create the policy. -
Navigate to cell 3 and add an entry for your new policy to “Policy-related artifacts” section:
policy3 = f" Prediction drift week-over-week"
-
Navigate to cell 4 and add an entry for your new policy to “feature set name and policy-related artifacts” section:
policy3 = f" Prediction drift week-over-week {count_user}"
-
When done, make sure to save your notebook.
-
Starting at cell 2, run the notebook again to apply the new variables (cells 2 and 3), create the new policy, run the preprocessing for that policy, and then run the new policy.
When finished, you can find the new policy in the Policy List for your model. Select the policy in the Policy List to see more information, including any generated alerts, detected drift, etc. From the policy information page you can activate and run the policy. See the Explore sample taxi fare model tutorial for more details.