Table Ingestion Tuning
Last updated
Last updated
Copyright © 2023 Arize AI, Inc
Data is ingested from tables by querying your table or view periodically. There are a few parameters that control how much data is ingested, as well as how often. To see the defaults of these parameters, as well as to change them, click Query Parameters
on Job Options
.
You will see the following 3 parameters with the current value displayed:
This parameter controls how often, in minutes, we should query your table. It is relative to the last time your table was queried, which you can see by clicking the Job ID
which gives you a chronological list of queries to your table.
This parameter controls how large, in hours, of a query window we should use: a query window is the time interval of your data, where time is given in the change_timestamp
column you supplied when first configuring the job. The beginning of the query window is always the largest change_timestamp
we have encountered while querying your table. The end of the query window is either specified in hours by this parameter, or if left to 0
as the default, means unbounded to the current time.
This is useful if you need to limit the amount of data scanned per query. If your table is large, we recommend partitioning your data by the change_timestamp
column, so this parameter gives you a way to limit the number of partitions scanned per query if cost is a concern.
This parameter controls how many rows to ingest, at most, per query. Note if you specify a query window size that covers an interval of rows with less than the row limit, you may get less than the row limit number of rows.