Auto#
- class Auto(target_metric: Metric, horizon: int, metric_aggregation: Literal['median', 'mean', 'std', 'percentile_5', 'percentile_25', 'percentile_75', 'percentile_95'] = 'mean', backtest_params: dict | None = None, experiment_folder: str | None = None, pool: Pool | List[BasePipeline] = Pool.default, runner: AbstractRunner | None = None, storage: BaseStorage | None = None, metrics: List[Metric] | None = None)[source]#
Bases:
AutoBase
Automatic pipeline selection via defined or custom pipeline pool.
Initialize Auto class.
- Parameters:
target_metric (Metric) – Metric to optimize.
horizon (int) – Horizon to forecast for.
metric_aggregation (Literal['median', 'mean', 'std', 'percentile_5', 'percentile_25', 'percentile_75', 'percentile_95']) – Aggregation method for per-segment metrics. By default, mean aggregation is used.
backtest_params (dict | None) – Custom parameters for backtest instead of default backtest parameters.
experiment_folder (str | None) – Name for saving experiment results, it determines the name for optuna study. By default, isn’t set.
pool (Pool | List[BasePipeline]) – Pool of pipelines to choose from. By default, default pool from
Pool
is used.runner (AbstractRunner | None) – Runner to use for distributed training. By default,
LocalRunner
is used.storage (BaseStorage | None) – Optuna storage to use. By default, sqlite storage is used.
metrics (List[Metric] | None) – List of metrics to compute. By default,
Sign
,SMAPE
,MAE
,MSE
,MedAE
metrics are used.
Methods
fit
(ts[, timeout, n_trials, initializer, ...])Start automatic pipeline selection.
objective
(ts, target_metric, ...[, ...])Optuna objective wrapper for the pool stage.
summary
()Get Auto trials summary.
top_k
([k])Get top k pipelines with the best metric value.
- fit(ts: TSDataset, timeout: int | None = None, n_trials: int | None = None, initializer: _Initializer | None = None, callback: _Callback | None = None, **kwargs) BasePipeline [source]#
Start automatic pipeline selection.
There are two stages:
Pool stage: trying every pipeline in a pool
Tuning stage: tuning
tune_size
best pipelines from a previous stage by using :py:class`~etna.auto.auto.Tune`.
Tuning stage starts only if limits on
n_trials
andtimeout
aren’t exceeded. Tuning goes from the best pipeline to the worst, and trial limits (n_trials
,timeout
) are divided evenly between each pipeline. If there are no limits on number of trials only the first pipeline will be tuned until user stops the process.- Parameters:
ts (TSDataset) – TSDataset to fit on.
timeout (int | None) – Timeout for optuna. N.B. this is timeout for each worker. By default, isn’t set.
n_trials (int | None) – Number of trials for optuna. N.B. this is number of trials for each worker. By default, isn’t set.
initializer (_Initializer | None) – Object that is called before each pipeline backtest, can be used to initialize loggers.
callback (_Callback | None) – Object that is called after each pipeline backtest, can be used to log extra metrics.
**kwargs – Parameter
tune_size
(default: 0) determines how many pipelines to fit during tuning stage. Other parameters are passed into optunaoptuna.study.Study.optimize()
.
- Return type:
- static objective(ts: TSDataset, target_metric: Metric, metric_aggregation: Literal['median', 'mean', 'std', 'percentile_5', 'percentile_25', 'percentile_75', 'percentile_95'], metrics: List[Metric], backtest_params: dict, initializer: _Initializer | None = None, callback: _Callback | None = None) Callable[[Trial], float] [source]#
Optuna objective wrapper for the pool stage.
- Parameters:
ts (TSDataset) – TSDataset to fit on.
target_metric (Metric) – Metric to optimize.
metric_aggregation (Literal['median', 'mean', 'std', 'percentile_5', 'percentile_25', 'percentile_75', 'percentile_95']) – Aggregation method for per-segment metrics.
backtest_params (dict) – Custom parameters for backtest instead of default backtest parameters.
initializer (_Initializer | None) – Object that is called before each pipeline backtest, can be used to initialize loggers.
callback (_Callback | None) – Object that is called after each pipeline backtest, can be used to log extra metrics.
- Returns:
function that runs specified trial and returns its evaluated score
- Return type:
objective
- summary() DataFrame [source]#
Get Auto trials summary.
There are columns:
hash: hash of the pipeline;
pipeline: pipeline object;
metrics: columns with metrics’ values;
state: state of the trial;
study: name of the study in which trial was made.
- Returns:
dataframe with detailed info on each performed trial
- Return type:
study_dataframe