log

Last updated 1 year ago

Was this helpful?

log

Arize method to log inferences within a dataframe to Arize via a POST request.

log(
    dataframe: pd.DataFrame,
    schema: Schema,
    environment: Environments,
    model_id: str,
    model_type: ModelTypes,
    metrics_validation: Optional[List[Metrics]] = None,
    model_version: Optional[str] = None,
    batch_id: Optional[str] = None,
    sync: Optional[bool] = False,
    validate: Optional[bool] = True,
    path: Optional[str] = None,
    surrogate_explainability: Optional[bool] = False,
    timeout: Optional[float] = None,
    verbose: Optional[bool] = False,
)

Client.log() returns a requests.models.Response object. You can check its http status code to ensure successful delivery of records.

if response.status_code == 200:
    print(f"✅ You have successfully logged production dataset to Arize")
else:
    print(
        f"Logging failed with response code {response.status_code}, {response.text}"
    )

This API uses fast serialization to the file system from Python and followed up by a fast client to server upload. It does require storage in the file system for the file being uploaded.

Parameter

Data Type

Description

dataframe

[Pandas.DataFrame]

(Required) The dataframe containing your model data

model_id

str

(Required) A unique name to identify your model in the platform

model_version

str

(Required*)A subset of predictions and actuals for a given model_id to compare and track changes *Required for logging predictions. Optional for logging actuals or shap values

model_type

(Required) Declare your model type represented in the platform and validate applicable performance metrics

environment

(Required) The environment (Production, Training, Validation) for your dataframe

schema

(Required) A Schema instance that specifies the column names for corresponding data in the dataframe

batch_id

str

(Optional*) Used to distinguish different batch of data under the same model_id and model_version *Only applicable and required for validation environment

metrics_validation

(Optional) A list of desired metric groups; defaults to None. When populated, and if validate=True, the presence of schema columns are validated against the desired metrics

validate

bool

(Optional) When set to True, validation is run on the model schema and dataframe before sending data. Defaults to True (recommended).

path

str

(Optional) Temporary directory/file to store the serialized data in binary before sending to Arize

sync

bool

(Optional) When sync is set to True, the log call will block, or wait, until the data has been successfully ingested by the platform and immediately return the status of the log

surrogate_explainability

bool

(Optional) Computes feature importance values using the surrogate explainability method. This requires that the arize module is installed with the [MimicExplainer] option. If feature importance values are already specified by the shap_values_column_names attribute in the Schema, this module will not run

Arize expects the DataFrame's index to be sorted and begin at 0. If you perform operations that might affect the index prior to logging data, reset the index as follows:

dataframe = dataframe.reset_index(drop=True)

Code Example

response = arize_client.log(
    dataframe=df,
    schema=schema,
    environment=Environments.Production,
    model_id="example_model",
    model_type=ModelTypes.BINARY_CLASSIFICATION
    metrics_validation=metrics_validation=[Metrics.CLASSIFICATION, Metrics.REGRESSION, Metrics.AUC_LOG_LOSS]
    model_version="1.0"
    validate=True
  )

Last updated 1 year ago

Was this helpful?

Arize method to log inferences within a dataframe to Arize via a POST request.

log(
    dataframe: pd.DataFrame,
    schema: Schema,
    environment: Environments,
    model_id: str,
    model_type: ModelTypes,
    metrics_validation: Optional[List[Metrics]] = None,
    model_version: Optional[str] = None,
    batch_id: Optional[str] = None,
    sync: Optional[bool] = False,
    validate: Optional[bool] = True,
    path: Optional[str] = None,
    surrogate_explainability: Optional[bool] = False,
    timeout: Optional[float] = None,
    verbose: Optional[bool] = False,
)

Client.log() returns a requests.models.Response object. You can check its http status code to ensure successful delivery of records.

if response.status_code == 200:
    print(f"✅ You have successfully logged production dataset to Arize")
else:
    print(
        f"Logging failed with response code {response.status_code}, {response.text}"
    )

This API uses fast serialization to the file system from Python and followed up by a fast client to server upload. It does require storage in the file system for the file being uploaded.

Parameter

Data Type

Description

dataframe

[Pandas.DataFrame]

(Required) The dataframe containing your model data

model_id

str

(Required) A unique name to identify your model in the platform

model_version

str

(Required*)A subset of predictions and actuals for a given model_id to compare and track changes *Required for logging predictions. Optional for logging actuals or shap values

model_type

(Required) Declare your model type represented in the platform and validate applicable performance metrics

environment

(Required) The environment (Production, Training, Validation) for your dataframe

schema

(Required) A Schema instance that specifies the column names for corresponding data in the dataframe

batch_id

str

(Optional*) Used to distinguish different batch of data under the same model_id and model_version *Only applicable and required for validation environment

metrics_validation

(Optional) A list of desired metric groups; defaults to None. When populated, and if validate=True, the presence of schema columns are validated against the desired metrics

validate

bool

(Optional) When set to True, validation is run on the model schema and dataframe before sending data. Defaults to True (recommended).

path

str

(Optional) Temporary directory/file to store the serialized data in binary before sending to Arize

sync

bool

(Optional) When sync is set to True, the log call will block, or wait, until the data has been successfully ingested by the platform and immediately return the status of the log

surrogate_explainability

bool

Arize expects the DataFrame's index to be sorted and begin at 0. If you perform operations that might affect the index prior to logging data, reset the index as follows:

dataframe = dataframe.reset_index(drop=True)

Code Example

response = arize_client.log(
    dataframe=df,
    schema=schema,
    environment=Environments.Production,
    model_id="example_model",
    model_type=ModelTypes.BINARY_CLASSIFICATION
    metrics_validation=metrics_validation=[Metrics.CLASSIFICATION, Metrics.REGRESSION, Metrics.AUC_LOG_LOSS]
    model_version="1.0"
    validate=True
  )