Sample a Dataset for an Experiment

Running a test on dataset sometimes requires running on random or stratified samples of the dataset. Arize supports running on samples by allowing teams to download a dataframe. That dataframe can be sampled prior to running the experiment.

#Get dataset as Dataframe
dataset_df = arize_client.get_dataset(space_id=SPACE_ID, dataset_name=dataset_name)

#Any sampling methods you want on a DF
sampled_df = dataset_df.sample(n=100)  # Sample 100 rows randomly
#sampled_df = dataset_df.sample(frac=0.1) # Sample 10% rows randomly
#stratified_sampled_df = dataset_df.groupby('class_label', group_keys=False).apply(lambda x: x.sample(frac=0.1))
#systematic_sampled_df = dataset_df.iloc[::10, :]  # Select every 10th row

#Run Experiment on sampled_df
client.run_experiment(space_id, dataset_name, sampled_df, taskfn, evaluators)

An experiment will only matched up with the data that was run against it. You can run experiments with different samples of the same dataset. The platform will take care of tracking and visualization.

Any complex sampling method that can be applied to a dataframe can be used for sampling.

Last updated

Copyright © 2023 Arize AI, Inc