Data profiling (APIs)

Get data profiling metrics

Seeing and understanding changes in data patterns is critical to effective model monitoring. Data profiling metrics provide insight into the data degradation that could lead to drops in model performance or that could be useful for root cause analysis for drift policies.

The API provides endpoints for accessing data profiling metrics. By default, metrics are calculated as part of running policies. Resulting metrics are stored in the Risk Store database for access. The Monitor placeholder notebook (accessible from the platform’s edahub service) shows an example of running a drift detection job, during which data profiling runs automatically.

REST API

Use v1/dataprofiling/submit to run the data profiling job for your model. Returns V1JobModel.

Note: Currently, this endpoint supports running a data profiling job as a sub job of running a drift detection job, and typically is not called directly.

Example payload:

{
    "deployment": "deployment_abc",
    "model_name": "deployment_abc",
    "model_version": "1",
    "model_stage": "primary",
    "policy": {
        "window_parameters": {
            "target": {
                "window_type": "day",
                "process_date": "2023-04-19",
            },
            "baseline": {
                "window_method": "prior",
                "window_type": "day",
                "process_date": "2023-04-19",
            },
        }
    },
    "status": "inactive",
    "segments": [
        {
            "model_uuid": "123-abc-456",
            "name": "segment-1",
            "description": "Segment to filter some data",
            "filters": [
                {
                    "feature_name": "feature-a",
                    "value": ["string1", "string2"],
                    "operator": "=",
                    "conjunction": "None",
                    "grouped_filters": "None",
                },
            ],
            "id": 2,
            "status": "active",
            "created_ts": "1683132722980.102",
            "modified_ts": "1683132722980.102",
            "created_by": "user1",
            "modified_by": "user2",
        }
    ],
}

Python SDK

Support for data profiling is provided by vianops_client.models.riskstore.data_profiling.V1DataProfilingModel.

Import and initiate client

V1DataProfilingModel
TABLE OF CONTENTS