Advanced monitoring

In the Monitor your model tutorial, you learn how the configuration of the monitor_placeholder_api_generic.ipynb notebook can create the sample taxi fare model for more details.). That tutorial walks through the out-of-the-box configuration and settings defined for the notebook to create the sample model.

In this advanced tutorial, you learn how to extend the sample notebook with advanced configurations, specifically how to:

Note: The Python file helper.py included with the sample notebook provides numerous client functions to make the notebook workflow easier to process and follow. All SDK classes are imported there to support SDK client functionality.

Set a different target for the feature set

The target for the feature set is one of the input parameters for the placeholder model notebook. It must be a column in the feature set. To set a different target for your feature set, add the name of that column as the value for targetcolumn. Make sure this column name (string format) matches a column in your feature set. When you run this cell, the notebook picks up your custom settings.

Change the feature set for the model

Note: Column names in feature sets must contain only lowercase letters (a-z) or numbers (0-9), or the characters underscore (_) or period (.)

When you specify a different feature set for the placeholder model it likely has different columns. You need to modify the features and columns set for the model to match those provided by the feature set. These values are set in the placeholder model notebook variables. Modify the following notebook variables as explained:

  • allcolumns—Names of all columns (including the target column) in the feature set, defined as an array (comma-separated and enclosed in quotes).
  • continuouscolumns—Names of all columns in the feature set that contain continuous data, defined as an array (comma-separated and enclosed in quotes). If there are no columns with continuous data, leave empty square brackets [].
  • categoricalcolumns—Names of all columns in the feature set that contain categorical data, defined as an array (comma-separated and enclosed in quotes). If there are no columns with categorical data, leave empty square brackets [].
  • str_categorical_cols—Names of columns in the feature set containing categorical data of string datatype. Use empty brackets [] if there are no categorical columns.
  • targetcolumn—Name of the column containing the target for the feature set, i.e., the feature that the model is predicting.
  • offset_col—Name of the feature set column containing the offset value (datetime format), used for generating time/date stamps identifying when predictions were made. Offsets the given datetime to today minus n number of days.
  • identifier_col—Name of the column containing the identifier feature that uniquely identifies predictions. (When the model sends predictions, the platform uses the identifier column to map the model’s ground truth data.)

In addition, as an option you can create a custom inference mapping that matches the columns and data. See Create a custom inference mapping (optional) for details.

Make sure all policies and segments configured in the placeholder model notebook also reflect the modified features and columns as needed.

Create a custom inference mapping (optional)

The sample notebook creates an inference mapping payload using the values defined when setting up notebook variables (cells 2 and 3 in the notebook). The parameters for the inference mapping are defined in the helper file method, inference_mapping_generator. Then, the helper file creates a payload using the values specified for categorical_columns, str_categorical_cols, continuous_columns, and offset_col to determine the columns[] schema values: name, dtype, sql_type, feature_type. See the V1InferenceMappingModel documentation.

When extending the placeholder model notebook to use your own data (i.e., local CSV or Parquet files) you can create your own custom schema to match your model’s data. With the custom inference mapping in place, the platform can understand and process inference data sent by the placeholder model deployment. In this topic, you learn how to create a new notebook cell with your custom inference mapping.

Notes: If you choose to not create a custom inference mapping, the platform infers the schema based on the values for the initial inferences. To do this, make sure initial inferences contain unambiguous typing. For example, if a column should be of datatype float then make sure the initial inference for that column is a float value; if the initial value instead is an integer, the platform types the column as datatype integer.

Local data filenames must contain only lowercase letters (a-z) or numbers (0-9), or the characters underscore (_) or period (.).

To create a new custom inference mapping

  1. Navigate to the notebook cell, “Load inference mapping schema”. This cell contains the default inference mapping for the placeholder model notebook.
  2. Press (+) to create a new empty cell.
  3. Create an instance of the class V1InferenceMappingJob with the values for your model’s schema. You can use the helper method inference_mapping_generator when creating your own schema.

    See the Inference Mapping API documentation for more information about the values needed to create the Inference Mapping object. Specifying the identifier and datetime columns is recommended.

  4. Make sure to update the columns and values defined in cell 3 of the notebook to match your new inference mapping. Also, make sure to create policies and segments that use your data, or modify the current policies and segments to match the new inference mapping, as needed.
  5. When done, make sure to save your notebook.
  6. If you’ve already run the sample placeholder model notebook cells to this point, you can run this cell to create the inference mapping in the platform. (You must be logged in to the platform by running the cell 2 of the notebook.) Otherwise, you need to run the notebook from the top to get everything set up correctly.

Modify inference tracking to support custom inference mapping

The sample monitor notebook uses the helper.py method populate_data() to send data to the backend. This method is tied to the sample notebook dataset “Taxi fare dataset”.

When using a custom inference mapping for a different dataset, you need to write your own code for populate_data() to provide the structure for sending your data to the backend.

Additionally, when using a custom inference mapping you need to write code to support inference tracking for your dataset, so that inferences for your new dataset can be sent to the backend. To do this, you implement cache_upload to upload the data to cache and inference_tracking to send the uploaded data to the VIANOPS backend.

Modify a policy to detect issues on different columns

The sample placeholder model notebook is configured to detect drift by analyzing and comparing data provided by features in the sample dataset. Specifically, the “Month-to-month drift policy” analyzes data equally across five features:

"select_features_type": "custom",
"feature_weightage": "equal",
"feature_weights": {
    "PULocation":20,
    "DOLocation":20,
    "est_trip_distance": 20,
    "est_travel_time": 20,
    "est_extra_cost": 20,
},

While both other drift policies—”Week-to-week drift w/segments” and “Day-to-day drift w/segments”—detect drift equally across three features:

"select_features_type": "custom",
"feature_weightage": "equal",
"feature_weights": {
    "est_trip_distance": 33.33333,
    "est_travel_time": 33.333333,
    "est_extra_amount": 33.33333,
},

Using this example, if you wanted a policy to detect drift equally using fewer or more features from the sample dataset, modify the notebook as follows:

  1. Navigate to the cell containing a drift policy to modify, within the section “Segments and Policies”.
  2. Modify the feature_weights value to specify the features for drift detection.
  3. Specify the same weight value for each feature so that when summed they equal 100.
  4. Make sure the features you specify for the policy are included in the dataset and identified as columns in the dataset in cell 3 and configured in any related segment as needed.
  5. Save the notebook and run the cells in order, i.e., create the policy, run the preprocessing job, and then run the policy.

As an alternative to running the modified policy from the notebook (in the “Run the policies” section), you can run it from the VIANOPS UI. In the Policy List page, select the policy. In the policy information page, select Actions > Run Policy. (See the Explore sample taxi fare model tutorial for more details.)

Note: To run API endpoints like these cells, you must be logged in to the platform by running cell 2 in the notebook.

Create a new segment for a policy

The sample placeholder model notebook explains how to create two segments and add them to policies. One segment, “Brooklyn-Manhattan Segment”, filters to look only at data where PULocation is either Downtown Brooklyn/MetroTech, DUMBO/Vinegar Hill, or Brooklyn Heights and DOLocation is either Little Italy/NoLiTa or Lower East Side. The other segment, “Williamsburg-Manhattan Segment”, filters to look only at data where PULocation is either Williamsburg (South Side) or South Williamsburg, and DOLocation is either Little Italy/NoLiTa or Lower East Side. (See the “Create Segment1” and “Create Segment2” cells for configuration.)

A new variable, segment_params, is an array containing both segments. Using segment_params, both segments are then added to the “Week-to-week drift w/segments” and “Day-to-day drift w/segments” policies via the key-value pair, "segments": segment_params. Each time the policies run, they look for drift three times across the data defined for a policy (i.e., defined by “feature_weights”): once across all data, once across the data defined by Segment1, and once across the data defined by Segment2.

Note: The other drift policy, "Month-to-month drift policy", does not use segment filters. Also, segments are not supported for performance policies and so are not included in "MAE performance policy".

The other drift policy, “Month-to-month drift policy”, does not use segment filters. Also, segments are not supported for performance policies and so are not included in “MAE performance Policy”.

New segment for this tutorial

In this topic, you learn how to create a new segment and add it to the “Month-to-month drift Policy”. This segment filters to analyze only the data where est_travel_time is less than 15.00 (minutes) or where est_trip_distance is under 2.5 (miles) and est_extra_cost is greater than 0 (cents).

We’re going to use the “Create Segment1” cell as a template and then modify for our needs. Our segment combines a simple filter and a grouped_filters[] to support the complex filter conditions.

When creating the filters, you use a combination of operators and conjunctions as explained in the client SDK documentation.

  1. Under the section “Create and add segments to the policies”, press “+” to create a new cell after the “Create Segment2” cell.

  2. Paste the following into the cell:

         payload = {
         "name": f"{segment3_name}",
         "description": "SEGMENT DESCRIPTION",
         "filters": [
             {
                 "feature_name": "DATASET FEATURE",
                 "operator": "OPERATOR"
                 "value": [
                     "VALUE",
                     "VALUE",
                     "VALUE",
                 ],
                 "conjunction": "CONJUNCTION",
             },
             {
                 "feature_name": "DATASET FEATURE",
                 "operator": "OPERATOR",
                 "value": ["VALUE, VALUE, VALUE"],
                 "conjunction": "CONJUNCTION",
             },
         ],
         "status": "inactive",
         "model_uuid": "MODEL_UUID",
     }
    
  3. Optionally, change description to “Trips under 2.5 miles or under 15 minutes”.

  4. Replace the first “filters” object to filter on the feature est_travel_time using the operator, value, and conjunction as shown:

         {
             "feature_name": "est_travel_time",
             "operator": "<",
             "value": [
                 15.00
             ],
             "conjunction": "null"
         },
    
  5. After the first filter add an OR conjunction.

         "conjunction": OR
    
  6. Next, create grouped_filters[] to configure a complex nested filter that is joined to the first filter with the OR conjunction. Create this complex filter for the est_trip_distance and est_extra_cost features using the operators, values, and conjunctions as shown:

     "grouped_filters": [
         {
             "feature_name": "est_trip_distance",
             "operator": "<"
             "value": ["2.5"],
             "conjunction": "AND",
         },
         {
             "feature_name": "est_extra_cost",
             "operator": ">",
             "value": ["0"],
             "conjunction": null
         },
     ]
    
  7. Finally, paste the following at the bottom of the cell:

     segment3 = V1SegmentBaseModel(**payload)
     segment3_list = V1SegmentBaseModelList(__root__=[])
     segment3_list.__root__.append(segment3)
     segment3_params = segments_api.create(segment3_list)
     print(segment3_params)
    

    The “payload” contains the configuration for the new segment.

    We’re passing the defined payload to segment3, which is a new instance of class V1SegmentBaseModel.

    Then, segment3_list (new instance of class V1SegmentBaseModelList) enables us to get a list of individual parameters for use by the policy. Use segments_api.create(segment3_list) to create the segment using the values of segment3.

  8. Navigate to cell 3 and add an entry for your new segment to “Policy-related artifacts” section:

    segment3_name = f"Short trips"

  9. Navigate to cell 4 and add an entry for your new segment to “feature set name and policy-related artifacts” section:

    segment3_name = f"Short trips {count_user}"

    Now, we can add the new segment to the policy.

  10. Navigate to the cell for “Create a base month-to-month policy” and update the segments parameter in the policy payload:

    segment_params = [segment3_params[0]]

  11. When done, make sure to save your notebook.

  12. Starting at cell 2, run the notebook again to apply the new variables (cells 2 and 3), create the new segment, and run the preprocessing job for that segment.

Create a new policy

The sample notebook includes four policies: three that detect feature drift, and one that detects issues with model performance. Two of the policies include segmentation, enabling them to run drift detection on a focused set of data as well as across all data defined for the policies. You can create new policies to run as part of this sample notebook.

Note: The other drift policy, "Month-to-month drift policy", does not include segments. Additionally, segments are not supported for performance policies and so are not included in "MAE performance policy".


New policy for this tutorial

In this topic, you learn how to add a new policy that detects distance-based drift on prediction data. Configuration for the policy specifies how it operates, the data it analyzes, the schedule it runs on, and the thresholds that indicate alert conditions. The prediction policy is running the configured metric against the prediction within a defined target window and against a defined baseline to determine when prediction drift exceeds thresholds. Our policy looks for drift in predictions generated during the current week as compared with a baseline of predictions generated two weeks ago.

For more information, see documentation for the prediction drift models and supported parameters.

  1. Navigate to the “Create model performance policy” cell (under the section “Segments and Policies”). This cell contains the configuration for the model performance policy.
  2. Press (+) to create a new empty cell after the model performance policy cell.
  3. Paste the following payload “template”. We go through this payload and the steps for modifying.
    payload = [
    {
        "deployment": f"{deployment}",
        "model_name": f"{deployment}",
        "model_version": f"{model_version}",
        "model_stage": f"{model_stage}",
        "name": f"{policy3_name}",
        "description": "DESCRIPTION",
        "type": "drift",
        "policy": {
            "type": "prediction-drift",
            "drift_type": "distance",
            "window_parameters": {
                    "target": {
                    "window_type": "WINDOW_TYPE"
                    },
                    "baseline": {
                    "window_method": "WINDOW_METHOD",
                    "window_type": "WINDOW_TYPE",
                    "last_amount: NUMBER
                    }
                },
                "drift_measure": "MEASURE",
                "warning_level": 0.1,
                "critical_level": 0.15,
                "schedule": "0 0 5 ? * *",
                "method": "preprocess",
                "baseline_bins":  {  
                    "total_amount": [NUMBERS]  
                },
        }
    },
]
  1. In the pasted payload template, leave the default settings for:

         "deployment": f"{deployment}",
         "model_name": f"{deployment}",
         "model_version": f"{model_version}",
         "model_stage": f"{model_stage}",
    
  2. Optionally, create a policy description as “Detect drift in total_amount between current week and prior week.”

  3. In policy{}, leave type and drift_type set at the template values.

  4. Under window_parameters, set target window_type to week.

  5. For baseline, set window_method to prior, window_type to week, and last_amount to 2.

  6. Set drift_measure to PSI so the policy uses the Population Stability Index to measure drift.

  7. Leave warning level and critical level set at the template values. These are default values and ensure warning alerts are signaled when the drift measure exceeds .1, and critical alerts are signaled when the drift measure exceeds .15.

  8. Leave the schedule set to "schedule": "0 0 8 ? * *" to ensure the policy runs daily at 8am.

    Note: The platform uses Coordinated Universal Time (UTC) format. If your local time or cloud node offset is different from UTC time, it is highly recommended that you create timestamps in UTC format.
  9. Leave method at preprocess. This specifies the method for saving policy data.

  10. Set custom baseline bins (i.e., start and end bin edges) to better view and understand prediction drift results. For our purposes, we are setting three bins for total_amount: 0 to 25, 26 to 50, and 51 to the end of the data:

        "baseline_bins":  {  
        "total_amount": [0, 25, 26, 50, 51]  
        },
    
  11. Finally, paste the following at the bottom of the cell:

         policy3 = V1PolicyRequestModel(**payload[0])
         policy3_data = V1PolicyRequestModelList(__root__=[])
         policy3_data.__root__.append(policy3)
         policy3_res = policies_api.create(policy3_data)
         print(policy3_res)
    

    “payload” contains the configuration for the new policy.

    We’re passing the defined payload to policy3, which is a new instance of class V1PolicyRequestModel. Then, policy3_data (a new instance of class V1PolicyRequestModelList) enables us to get a list of individual parameters for use by the policy. Use policies_api.create(policy3_data) to create the policy.

  12. Navigate to cell 3 and add an entry for your new policy to “Policy-related artifacts” section:

    policy3 = f" Prediction drift week-over-week"

  13. Navigate to cell 4 and add an entry for your new policy to “feature set name and policy-related artifacts” section:

    policy3 = f" Prediction drift week-over-week {count_user}"

  14. When done, make sure to save your notebook.

  15. Starting at cell 2, run the notebook again to apply the new variables (cells 2 and 3), create the new policy, run the preprocessing for that policy, and then run the new policy.

Note: To run API endpoints like this cell, you must be logged in to the platform by running the second cell in the notebook.

When finished, you can find the new policy in the Policy List for your model. Select the policy in the Policy List to see more information, including any generated alerts, detected drift, etc. From the policy information page you can activate and run the policy. See the Explore sample taxi fare model tutorial for more details.

TABLE OF CONTENTS