StackingEnsemble#

class StackingEnsemble(pipelines: List[BasePipeline], final_model: RegressorMixin | None = None, n_folds: int = 3, features_to_use: None | Literal['all'] | List[str] = None, n_jobs: int = 1, joblib_params: Dict[str, Any] | None = None)[source]#

Bases: EnsembleMixin, SaveEnsembleMixin, BasePipeline

StackingEnsemble is a pipeline that forecast future using the metamodel to combine the forecasts of the base models.

Examples

>>> from etna.datasets import generate_ar_df
>>> from etna.datasets import TSDataset
>>> from etna.ensembles import VotingEnsemble
>>> from etna.models import NaiveModel
>>> from etna.models import MovingAverageModel
>>> from etna.pipeline import Pipeline
>>> import pandas as pd
>>> pd.options.display.float_format = '{:,.2f}'.format
>>> df = generate_ar_df(periods=100, start_time="2021-06-01", ar_coef=[0.8], n_segments=3)
>>> df_ts_format = TSDataset.to_dataset(df)
>>> ts = TSDataset(df_ts_format, "D")
>>> ma_pipeline = Pipeline(model=MovingAverageModel(window=5), transforms=[], horizon=7)
>>> naive_pipeline = Pipeline(model=NaiveModel(lag=10), transforms=[], horizon=7)
>>> ensemble = StackingEnsemble(pipelines=[ma_pipeline, naive_pipeline])
>>> _ = ensemble.fit(ts=ts)
>>> forecast = ensemble.forecast()
>>> forecast[:,:,"target"]
segment    segment_0 segment_1 segment_2
feature       target    target    target
timestamp
2021-09-09      0.70      1.47      0.20
2021-09-10      0.62      1.53      0.26
2021-09-11      0.50      1.78      0.36
2021-09-12      0.37      1.88      0.21
2021-09-13      0.46      1.87      0.25
2021-09-14      0.44      1.49      0.21
2021-09-15      0.36      1.56      0.30

Init StackingEnsemble.

Parameters:
  • pipelines (List[BasePipeline]) – List of pipelines that should be used in ensemble.

  • final_model (RegressorMixin | None) – Regression model with fit/predict interface which will be used to combine the base estimators.

  • n_folds (int) – Number of folds to use in the backtest. Backtest is not used for model evaluation but for prediction.

  • features_to_use (None | Literal['all'] | ~typing.List[str]) – Features except the forecasts of the base models to use in the final_model.

  • n_jobs (int) – Number of jobs to run in parallel.

  • joblib_params (Dict[str, Any] | None) – Additional parameters for joblib.Parallel.

Raises:

ValueError: – If the number of the pipelines is less than 2 or pipelines have different horizons.

Methods

backtest(ts, metrics[, n_folds, mode, ...])

Run backtest with the pipeline.

fit(ts)

Fit the ensemble.

forecast([ts, prediction_interval, ...])

Make a forecast of the next points of a dataset.

load(path[, ts])

Load an object.

params_to_tune()

Get hyperparameter grid to tune.

predict(ts[, start_timestamp, ...])

Make in-sample predictions on dataset in a given range.

save(path)

Save the object.

set_params(**params)

Return new object instance with modified parameters.

to_dict()

Collect all information about etna object in dict.

Attributes

This class stores its __init__ parameters as attributes.

backtest(ts: TSDataset, metrics: List[Metric], n_folds: int | List[FoldMask] = 5, mode: str | None = None, aggregate_metrics: bool = False, n_jobs: int = 1, refit: bool | int = True, stride: int | None = None, joblib_params: Dict[str, Any] | None = None, forecast_params: Dict[str, Any] | None = None) Tuple[DataFrame, DataFrame, DataFrame][source]#

Run backtest with the pipeline.

If refit != True and some component of the pipeline doesn’t support forecasting with gap, this component will raise an exception.

Parameters:
  • ts (TSDataset) – Dataset to fit models in backtest

  • metrics (List[Metric]) – List of metrics to compute for each fold

  • n_folds (int | List[FoldMask]) – Number of folds or the list of fold masks

  • mode (str | None) – Train generation policy: ‘expand’ or ‘constant’. Works only if n_folds is integer. By default, is set to ‘expand’.

  • aggregate_metrics (bool) – If True aggregate metrics above folds, return raw metrics otherwise

  • n_jobs (int) – Number of jobs to run in parallel

  • refit (bool | int) –

    Determines how often pipeline should be retrained during iteration over folds.

    • If True: pipeline is retrained on each fold.

    • If False: pipeline is trained only on the first fold.

    • If value: int: pipeline is trained every value folds starting from the first.

  • stride (int | None) – Number of points between folds. Works only if n_folds is integer. By default, is set to horizon.

  • joblib_params (Dict[str, Any] | None) – Additional parameters for joblib.Parallel

  • forecast_params (Dict[str, Any] | None) – Additional parameters for forecast()

Returns:

metrics_df, forecast_df, fold_info_df – Metrics dataframe, forecast dataframe and dataframe with information about folds

Return type:

Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]

Raises:
  • ValueError: – If mode is set when n_folds are List[FoldMask].

  • ValueError: – If stride is set when n_folds are List[FoldMask].

fit(ts: TSDataset) StackingEnsemble[source]#

Fit the ensemble.

Parameters:

ts (TSDataset) – TSDataset to fit ensemble.

Returns:

Fitted ensemble.

Return type:

self

forecast(ts: TSDataset | None = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), n_folds: int = 3, return_components: bool = False) TSDataset[source]#

Make a forecast of the next points of a dataset.

The result of forecasting starts from the last point of ts, not including it.

Parameters:
  • ts (TSDataset | None) – Dataset to forecast. If not given, dataset given during :py:meth:fit is used.

  • prediction_interval (bool) – If True returns prediction interval for forecast

  • quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval

  • n_folds (int) – Number of folds to use in the backtest for prediction interval estimation

  • return_components (bool) – If True additionally returns forecast components

Returns:

Dataset with predictions

Raises:

NotImplementedError: – Adding target components is not currently implemented

Return type:

TSDataset

classmethod load(path: Path, ts: TSDataset | None = None) Self[source]#

Load an object.

Parameters:
  • path (Path) – Path to load object from.

  • ts (TSDataset | None) – TSDataset to set into loaded pipeline.

Returns:

Loaded object.

Return type:

Self

params_to_tune() Dict[str, BaseDistribution][source]#

Get hyperparameter grid to tune.

Not implemented for this class.

Returns:

Grid with hyperparameters.

Return type:

Dict[str, BaseDistribution]

predict(ts: TSDataset, start_timestamp: Timestamp | None = None, end_timestamp: Timestamp | None = None, prediction_interval: bool = False, quantiles: Sequence[float] = (0.025, 0.975), return_components: bool = False) TSDataset[source]#

Make in-sample predictions on dataset in a given range.

Currently, in situation when segments start with different timestamps we only guarantee to work with start_timestamp >= beginning of all segments.

Parameters:
  • ts (TSDataset) – Dataset to make predictions on.

  • start_timestamp (Timestamp | None) – First timestamp of prediction range to return, should be >= than first timestamp in ts; expected that beginning of each segment <= start_timestamp; if isn’t set the first timestamp where each segment began is taken.

  • end_timestamp (Timestamp | None) – Last timestamp of prediction range to return; if isn’t set the last timestamp of ts is taken. Expected that value is less or equal to the last timestamp in ts.

  • prediction_interval (bool) – If True returns prediction interval for forecast.

  • quantiles (Sequence[float]) – Levels of prediction distribution. By default 2.5% and 97.5% taken to form a 95% prediction interval.

  • return_components (bool) – If True additionally returns forecast components

Returns:

Dataset with predictions in [start_timestamp, end_timestamp] range.

Raises:
  • ValueError: – Value of end_timestamp is less than start_timestamp.

  • ValueError: – Value of start_timestamp goes before point where each segment started.

  • ValueError: – Value of end_timestamp goes after the last timestamp.

  • NotImplementedError: – Adding target components is not currently implemented

Return type:

TSDataset

save(path: Path)[source]#

Save the object.

Parameters:

path (Path) – Path to save object to.

set_params(**params: dict) Self[source]#

Return new object instance with modified parameters.

Method also allows to change parameters of nested objects within the current object. For example, it is possible to change parameters of a model in a Pipeline.

Nested parameters are expected to be in a <component_1>.<...>.<parameter> form, where components are separated by a dot.

Parameters:

**params (dict) – Estimator parameters

Returns:

New instance with changed parameters

Return type:

Self

Examples

>>> from etna.pipeline import Pipeline
>>> from etna.models import NaiveModel
>>> from etna.transforms import AddConstTransform
>>> model = model=NaiveModel(lag=1)
>>> transforms = [AddConstTransform(in_column="target", value=1)]
>>> pipeline = Pipeline(model, transforms=transforms, horizon=3)
>>> pipeline.set_params(**{"model.lag": 3, "transforms.0.value": 2})
Pipeline(model = NaiveModel(lag = 3, ), transforms = [AddConstTransform(in_column = 'target', value = 2, inplace = True, out_column = None, )], horizon = 3, )
to_dict()[source]#

Collect all information about etna object in dict.