Inference Mapping (APIs)

The inference mapping (or data schema) for a model specifies how inferences/predictions from the model are stored in the platform. It includes information about where to store inferences, the columns in the inference data and their datatypes, and the primary key and datetime columns. Once created, an inference mapping cannot be modified. You can use the endpoint /v1/inference-mapping/search to locate previously created inference mappings.

When creating a model, the inference mapping can be defined explicitly for that model. If not created explicitly, the platform infers the inference mapping based on initial inferences generated for that model. If possible, the best practice is to create a custom inference mapping with identifier and datetime columns.

Note: If you choose to not create a custom inference mapping, the platform infers the schema based on the values for the initial inferences. To do this, make sure initial inferences contain unambiguous typing. For example, if a column should be of datatype float then make sure the initial inference for that column contains is a float value; if the initial value instead is an integer, the platform types the column as datatype integer.

After the inference mapping is created, the endpoint /v1/inference-tracking can load the model’s inferences/predictions into the platform (inference mapping table).

Inference mapping object

{
"index": "2",
"deployment": "deployment_xyy",
"model_name": "model_xyz",
"model_version": "1",
"model_stage": "primary",
"connection": "clickhouse://default:@clickhouse/default",
"df_schema": {
    "target_col": "target-feature-name",
    "identifier_cols": [
        "feature_id"
    ],
    "columns": [
        {
  "feature_id": 1,
  "name": "feature_a",
  "dtype": "float64",
  "sql_type": "Float32",
  "feature_type": "categorical",
  "segmentation": true,
  "drift": true,
  "hotspot": true,
  "rca": [
      "feature_b"
   ],
  "round": null
        },
        {
  "feature_id": 2,
  "name": "feature_b",
  "dtype": "float64",
  "sql_type": "Floa32",
  "feature_type": "categorical",
  "segmentation": true,
  "drift": true,
  "hotspot": true,
  "rca": [
      "feature_a"
    ],
  "round": null
        },
        {
  "feature_id": 20,
  "name": "feature_id",
  "dtype": "int64",
  "sql_type": "Int64",
  "feature_type": "unknown",
  "segmentation": true,
  "drift": true,
  "hotspot": false,
  "rca": [
      "feature_a",
    	"feature_b"
   ],  
  "round": null
        },
        {
  "feature_id": 5,
  "name": "feature_datetime",
  "dtype": "datetime64[ns]",
  "sql_type": "",
  "feature_type": "unknown",
  "segmentation": true,
  "drift": false,
  "hotspot": true,
  "rca": [], 
  "round": null
        }
    ]
    "datetime_col": "feature_datetime",
    "predict_proba_col": null

},
"identifier_cols_schema": null,
"inference_table": "1678468466777_dit",
"ground_truth_table": "1678468466888_gt",
"joined_table": "1678468466999_joined",
"key_table": "1678468466000_key",
"create_ddl": null,
"postprocessors": []
}
{
  "data": {
    "h-0": "Property",
    "h-1": "Type",
    "h-2": "Description",
    "0-0": "index",
    "0-1": "integer",
    "0-2": "Assigned automatically by the platform.",
    "1-0": "deployment",
    "1-1": "string",
    "1-2": "Name of the deployment.",
    "2-0": "model_name",
    "2-1": "string",
    "2-2": "Name of the model.",
    "3-0": "model_version",
    "3-1": "string",
    "3-2": "Version of the model, e.g., 1 or 2.",
    "4-0": "model_stage",
    "4-1": "string",
    "4-2": "Stage of the model, e.g., primary.",
    "5-0": "connection",
    "5-1": "string",
    "5-2": "Data connection parameters.",
    "6-0": "df_schema",
    "6-1": "list",
    "6-2": "Identifies the schema for the model: datetime column, target column, prediction probability column (for classification models), identifier columns, and all columns (all features). See table, below.",
    "7-0": "identifier_cols_schema",
    "7-1": "any",
    "7-2": "One or more identifier columns that uniquely identify each prediction row. When the model sends predictions, the platform uses the identifier columns to map the model’s ground truth data. It is recommended that you specify `identifier_cols` if you intend to use ground truth functionality (which is required for getting model performance data).  \n  \nBy default, if you leave `identifier_cols[]` empty, the platform generates a UUID that can be used to map ground_truth. Use the [/v1/inference-tracking](apis.html#rest-api-documentation) GET endpoint to access the UUID.",
    "8-0": "inference_table",
    "8-1": "string",
    "8-2": "Database table containing inference data.",
    "9-0": "ground_truth_table",
    "9-1": "string",
    "9-2": "Database table containing ground truth data.",
    "10-0": "joined_table",
    "10-1": "string",
    "10-2": "Database table containing inference data and ground truth data.",
    "11-0": "key_table",
    "11-1": "string",
    "11-2": "Database table containing all identifier columns and datetime column.",
    "12-0": "create_ddl",
    "12-1": "string",
    "12-2": "If needed, query to create ddl tables.",
    "13-0": "postprocessors",
    "13-1": "array",
    "13-2": "Internal use.   One or more postprocessor (and related value) to include in inference mapping, to transform output data before sending to backend database. The available postprocessors include: `PickHighestProbability`, `PickInferenceMapper`,  `PickProbablityThreshold`."
  },
  "cols": 3,
  "rows": 14,
  "align": [
    "left",
    "left",
    "left"
  ]
}

df_schema object

"df_schema": {
     "target_col": "feature_b",
     "identifier_cols": "feature_id",
     "columns": [],
  	 "datetime_col": "feature_datetime",
     "predict_proba_col": null,
}
Property Type Description
target_col string Column providing the target feature. Name of the feature you want to predict.
identifier_cols string List of identifier columns that uniquely identify each prediction row. When the model sends predictions, the platform uses the identifier columns to map the model’s ground truth data. It is recommended that you specify identifier columns if you intend to use ground truth functionality. (By default, if you do not specify identifier columns the platform generates a UUID that can be used to map ground truth. Use the /v1/inference-tracking endpoint to access the UUID.)
columns array Information about all columns in the feature set, including the target column. See table, below.
datetime_col string Column providing the date/timestamp for the features. If not provided, platform generates “vianai_ts” datetime column.
predict_proba_col string (Classification models only.) Column used for calculating model performance metrics, for example for ROC/AUC score, Gini coefficient, log loss, and lift. For binary classification, specify floating point numbers between 0 and 1, inclusive; for multi-class classification, specify an array of floats, in [0,1] with length of array = number of classes.

columns[]

When creating an inference mapping the endpoint passes the params identified in the columns[] table explained below. Note that platform backend processes create additional properties for internal purposes.

{
  "data": {
    "h-0": "Property",
    "h-1": "Type",
    "h-2": "Description",
    "0-0": "feature_id",
    "0-1": "integer",
    "0-2": "Platform-generated id for this feature.",
    "1-0": "name",
    "1-1": "string",
    "1-2": "Name for feature (including target and ID) in the inference dataset. Make sure this matches a column name from the dataset.",
    "2-0": "dtype",
    "2-1": "string",
    "2-2": "Data type of the feature. Supported values are `int`, `int64`, `float`, `float64`, `object`, `string`, `datetime`, `datetime64[ns]`, `bool`.",
    "3-0": "sql_type",
    "3-1": "string",
    "3-2": "SQL type of the feature. (Currently not used.) The value depends on `dtype`; \\*see table below.  \n  \nFor example, if`dtype` for the column is int, then set `sql_type` to int32.",
    "4-0": "feature_type",
    "4-1": "array",
    "4-2": "Type of feature. Supported values are `categorical` or `continuous` (the default is `continuous` if not defined).",
    "5-0": "segmentation",
    "5-1": "Boolean",
    "5-2": "If `true`this feature can be used to define conditional segments. Set to `true` by default.",
    "6-0": "drift",
    "6-1": "Boolean",
    "6-2": "If `true`, the platform computes drift metrics on this feature. Set to `true` by default.",
    "7-0": "hotspot",
    "7-1": "Boolean",
    "7-2": "If `true` the feature can be used for [hotspot analysis](api-policies.html#hotsot-analysis) as part of root cause analysis. Set by the platform based on feature configuration, e.g., feature_type is `categorical` and segmentation and drift are set to `true`. By default, set to `true`.",
    "8-0": "rca ",
    "8-1": "array",
    "8-2": "List of the hotspot features available for hotspot analysis. (Ensures only relevant features are available for hotspot analysis.) Set by the platform based on feature configuration.",
    "9-0": "round",
    "9-1": "integer",
    "9-2": "For continuous float columns, identifies the number of decimals for rounding."
  },
  "cols": 3,
  "rows": 10,
  "align": [
    "left",
    "left",
    "left"
  ]
}

The sql_type value maps to dtype, as explained in the following table:

dtype sql_type
int Int32
int64 Int64
float Float32
float64 Float32
object String
string String
datetime DateTime
datetime64[ns] DateTime
bool UInt8
TABLE OF CONTENTS