Library Reference¶

forecast¶

Functions to run forecast

anticipy.forecast.get_residuals(params, model, a_x, a_y, a_date, a_weights=None, df_actuals=None, **kwargs)¶

Given a time series, a model function and a set of parameters, get the residuals

Parameters

params (numpy array of floats) – parameters for model function
model (function or ForecastModel instance) – model function. Usage: model(a_x, a_date, params)
a_x (numpy array of floats) – X axis for model function.
a_y (numpy array of floats) – Input time series values, to compare to the model function
a_date (numpy array of datetimes) – Dates for the input time series
a_weights (numpy array of floats) – weights for each individual sample
df_actuals (pandas DataFrame) – The original dataframe with actuals data. Not required for regression but used by naive models

Returns

array with residuals, same length as a_x, a_y

Return type

numpy array of floats

anticipy.forecast.optimize_least_squares(model, a_x, a_y, a_date, a_weights=None, df_actuals=None, use_cache=True)¶

Given a time series and a model function, find the set of parameters that minimises residuals

Parameters

model (function) – model function, to be fitted against the actuals
a_x (numpy array of floats) – X axis for model function.
a_y (numpy array of floats) – Input time series values, to compare to the model function
a_date (numpy array of datetimes) – Dates for the input time series
a_weights (numpy array of floats) – weights for each individual sample
df_actuals (pandas DataFrame) – The original dataframe with actuals data. Not required for regression but used by naive models
use_cache (bool) – If true, save some model variables to cache when fitting

Returns

table(success, params, cost, optimality,
iterations, status, jac_evals, message):

- success (bool): True if successful fit
- params (list): Parameters of fitted model
- cost (float): Value of cost function
- optimality(float)
- iterations (int) : Number of function evaluations
- status (int) : Status code
- jac_evals(int) : Number of Jacobian evaluations
- message (str) : Output message

Return type

pandas.DataFrame

anticipy.forecast.normalize_df(df_y, col_name_y='y', col_name_weight='weight', col_name_x='x', col_name_date='date', col_name_source='source')¶

Converts an input dataframe for run_forecast() into a normalized format suitable for fit_model()

Parameters

df_y (pandas.DataFrame) – unformatted input dataframe, for use by run_forecast()
col_name_y (basestring) – name for column with time series values
col_name_weight (basestring) – name for column with time series weights
col_name_x (basestring) – name for column with time series indices
col_name_date (basestring) – name for column with time series dates
col_name_source (basestring) – name for column with time series source identifiers

Returns

formatted input dataframe, for use by run_forecast()

Return type

pandas.DataFrame

anticipy.forecast.fit_model(model, df_y, freq='W', source='test', df_actuals=None, use_cache=True)¶

Given a time series and a model, optimize model parameters and return

Parameters

model (function or ForecastModel instance) – model function. Usage: model(a_x, a_date, params)
df_y (pandas.DataFrame) –

Dataframe with the following columns:

- y:

- date: (optional)

- weight: (optional)

- x: (optional)
source (basestring) – source identifier for this time series
freq (basestring) – ‘W’ or ‘D’ . Used only for metadata
df_actuals (pandas DataFrame) – The original dataframe with actuals data. Not required for regression but used by naive models
use_cache (bool) – If true, save some model variables to cache when fitting

Returns

table (source, model_name, y_weights , freq, is_fit, aic_c, params)

Return type

pandas.DataFrame

This function calls optimize_least_squares() to perform the optimization loop. It performs some cleaning up of input and output parameters.

anticipy.forecast.extrapolate_model(model, params, date_start_actuals, date_end_actuals, freq='W', extrapolate_years=2.0, x_start_actuals=0.0, df_actuals=None)¶

Given a model and a set of parameters, generate model output for a date range plus a number of additional years.

Parameters

model (function or ForecastModel instance) – model function. Usage: model(a_x, a_date, params)
params (numpy array of floats) – parameters for model function
date_start_actuals (str, datetime, int or float) – date or numeric index for first actuals sample
date_end_actuals (str, datetime, int or float) – date or numeric index for last actuals sample
freq (basestring) – Time unit between samples. Supported units are ‘W’ for weekly samples, or ‘D’ for daily samples. (untested) Any date unit or time unit accepted by numpy should also work, see https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.datetime.html#arrays-dtypes-dateunits # noqa
extrapolate_years (float) – Number of years (or fraction of year) covered by the generated time series, after the end of the actuals
x_start_actuals –
df_actuals (pandas DataFrame) – The original dataframe with actuals data. Not required for regression but used by naive models

Returns

dataframe with a time series extrapolated from the model function

Return type

pandas.DataFrame, with an ‘y’ column of floats

anticipy.forecast.get_list_model(l_model_trend, l_model_season, season_add_mult='both')¶

Generate a list of composite models from lists of trend and seasonality models

Parameters

l_model_trend (list of ForecastModel) – list of trend models
l_model_season (list of ForecastModel) – list of seasonality models
season_add_mult (basestring) – ‘mult’, ‘add’ or ‘both’, for multiplicative/additive composition (or both types)

Returns

Return type

list of ForecastModel

All combinations of possible composite models are included

anticipy.forecast.get_df_actuals_clean(df_actuals, source, source_long)¶

Convert an actuals dataframe to a clean format

Parameters

df_actuals (pandas.DataFrame) – dataframe in normalized format, with columns y and optionally x, date, weight
source (basestring) – source identifier for this time series
source_long (basestring) – long-format source identifier for this time series

Returns

clean actuals dataframe

Return type

pandas.DataFrame

anticipy.forecast.run_forecast(df_y, l_model_trend=None, l_model_season=None, date_start_actuals=None, source_id='src', col_name_y='y', col_name_weight='weight', col_name_x='x', col_name_date='date', col_name_source='source', extrapolate_years=0, season_add_mult='add', include_all_fits=False, simplify_output=True, find_outliers=False, l_season_yearly=None, l_season_weekly=None, verbose=None, l_model_naive=None, l_model_calendar=None, n_cum=None, pi_q1=5, pi_q2=20, pi_widening_freq='Y', use_cache=True)¶

Generate forecast for one or more input time series

Parameters

df_y (pandas.DataFrame) –

input dataframe with the following columns:

- Mandatory: a value column, with the time series values

- Optional: weight column, source ID column, index column, date

column
l_model_trend (list of ForecastModel) – list of trend models
l_model_season (list of ForecastModel) – list of seasonality models
date_start_actuals (str, datetime, int or float) – date or numeric index for first actuals sample to be used for forecast. Previous samples are ignored
source_id (basestring) – source identifier for time series, if source column is missing
col_name_y (basestring) – name for column with time series values
col_name_weight (basestring) – name for column with time series weights
col_name_x (basestring) – name for column with time series indices
col_name_date (basestring) – name for column with time series dates
col_name_source (basestring) – name for column with time series source identifiers
extrapolate_years (float) – Number of years (or fraction of year) covered by the forecast, after the end of the actuals
season_add_mult (str) – ‘add’, ‘mult’, or ‘both’. Whether forecast seasonality will be additive, multiplicative, or the best fit of the two.
find_outliers (bool) – If True, find outliers in input data, ignore outlier samples in forecast
include_all_fits (bool) – If True, also include non-optimal models in output
simplify_output (bool) – If False, return dict with forecast and metadata. Otherwise, return only forecast.
l_season_yearly (list of ForecastModel) – yearly seasonality models to consider in automatic seasonality detection
l_season_weekly (list of ForecastModel) – yearly seasonality models to consider in automatic seasonality detection
verbose (bool) – If True, enable verbose logging
l_model_naive (list of ForecastModel) – list of naive models to consider for forecast. Naive models are not fitted with regression, they are based on the last actuals samples
l_model_calendar (list of ForecastModel) – list of calendar models to consider for forecast, to handle holidays and calendar-based events
n_cum (int) – Used for widening prediction interval. Interval widens every n_sims samples.
pi_q1 (int) – Percentile for outer prediction interval (defaults to 5%-95%)
pi_q2 (int) – Percentile for inner prediction interval (defaults to 20%-80%)
pi_widening_freq (str) – Specifies the frequency with which the prediction interval widens: Y (yearly), M (monthly), W (weekly), D (daily). This parameters is deprecated - use pi_widening_freq instead
use_cache (bool) – If true, save some model variables to cache when fitting

Returns

With simplify_output=False, returns a dictionary with 4 dataframes:
- forecast: output time series with prediction interval
- data: output time series. If include_all_fits, includes all fitting
models
- metadata: forecast metadata table
- optimize_info: debugging metadata from scipy.optimize

With simplify_output=True, returns the ‘forecast’ dataframe,
as described above

Return type

pandas.DataFrame or dict of pandas.DataFrames

anticipy.forecast.aggregate_forecast_dict_results(l_dict_result)¶

Aggregates a list of dictionaries with forecast outputs into a single dictionary

Parameters: l_dict_result (list of dictionaries) – list with outputs dictionaries from run_forecast_single
Returns: aggregated dictionary
Return type: dict

anticipy.forecast.run_forecast_single(df_y, l_model_trend=None, l_model_season=None, date_start_actuals=None, source_id='src', extrapolate_years=0, season_add_mult='add', include_all_fits=False, simplify_output=True, find_outliers=False, l_season_yearly=None, l_season_weekly=None, l_model_naive=None, l_model_calendar=None, n_cum=1, pi_q1=5, pi_q2=20, pi_widening_freq=None, use_cache=True)¶

Generate forecast for one input time series

Parameters

df_y (pandas.DataFrame) –

input dataframe with the following columns:

- y: time series values

- x: time series indices

- weight: time series weights (optional)

- date: time series dates (optional)
l_model_trend (list of ForecastModel) – list of trend models
l_model_season (list of ForecastModel) – list of seasonality models
date_start_actuals (str, datetime, int or float) – date or numeric index for first actuals sample to be used for forecast. Previous samples are ignored
source_id (basestring) – source identifier for time series
extrapolate_years (float) –
season_add_mult (str) – ‘add’, ‘mult’, or ‘both’. Whether forecast seasonality will be additive, multiplicative, or the best fit of the two.
include_all_fits (bool) – If True, also include non-optimal models in output
simplify_output (bool) – If False, return dict with forecast and metadata. Otherwise, return only forecast.
find_outliers (bool) – If True, find outliers in input data, ignore outlier samples in forecast
l_season_yearly (list of ForecastModel) – yearly seasonality models to consider in automatic seasonality detection
l_season_weekly (list of ForecastModel) – yearly seasonality models to consider in automatic seasonality detection
l_model_naive (list of ForecastModel) – list of naive models to consider for forecast. Naive models are not fitted with regression, they are based on the last actuals samples
l_model_calendar (list of ForecastModel) – list of calendar models to consider for forecast, to handle holidays and calendar-based events
n_cum (int) – Used for widening prediction interval. Interval widens every n_sims samples. This parameters is deprecated - use pi_widening_freq instead
pi_q1 (int) – Percentile for outer prediction interval (defaults to 5%-95%)
pi_q2 (int) – Percentile for inner prediction interval (defaults to 20%-80%)
widening_freq – Specifies the frequency with which the prediction interval widens: Y (yearly), M (monthly), W (weekly), D (daily).
use_cache (bool) – If true, save some model variables to cache when fitting

Returns

With simplify_output=False, returns a dictionary with 4 dataframes:
- forecast: output time series with prediction interval
- data: output time series. If include_all_fits, includes all fitting
models
- metadata: forecast metadata table
- optimize_info: debugging metadata from scipy.optimize

With simplify_output=True, returns the ‘forecast’ dataframe,
as described above

Return type

pandas.DataFrame or dict of pandas.DataFrames

anticipy.forecast.run_l_forecast(l_fcast_input, col_name_y='y', col_name_weight='weight', col_name_x='x', col_name_date='date', col_name_source='source', extrapolate_years=0, season_add_mult='add', include_all_fits=False, find_outliers=False, use_cache=True)¶

Generate forecasts for a list of SolverConfig objects, each including a time series, model functions, and other configuration parameters.

Parameters

l_fcast_input (list of ForecastInput) – List of forecast input configurations. Each element includes a time series, candidate forecast models for trend and seasonality, and other configuration parameters. For each input configuration, a forecast time series will be generated.
return_all_models (bool) –

If True, result includes non-fitting models, with null AIC and an

empty forecast df. Otherwise, result includes only fitting models,

and for time series where no fitting model is available, a

’no-best-model’ entry with null AIC and an empty forecast

df is added.
return_all_fits (bool) – If True, result includes all models for each nput time series. Otherwise, only the best model is included.
extrapolate_years (float) –
season_add_mult (str) – ‘add’, ‘mult’, or ‘both’. Whether forecast seasonality will be additive, multiplicative, or the best fit of the two.
fill_gaps_y_values (bool) – If True, gaps in time series will be filled with NaN values
freq (str) – ‘W’ or ‘D’ . Sampling frequency of the output forecast: weekly or daily.
use_cache (bool) – If true, save some model variables to cache when fitting

Returns

dict(data,metadata)
data: dataframe(date, source, model, y)
metadata: dataframe(‘source’, ‘model’, ‘res_weights’, ‘freq’,
’is_fit’, ‘cost’, ‘aic_c’, ‘params’, ‘status’)

Return type

dict

class anticipy.forecast.ForecastInput(source_id, df_y, l_model_trend=None, l_model_season=None, weights_y_values=1.0, date_start_actuals=None)¶: Class that encapsulates input variables for forecast.run_forecast()

anticipy.forecast.get_pi(df_forecast, n_sims=100, n_cum=None, pi_q1=5, pi_q2=20, widening_freq='Y')¶

Generate prediction intervals for a table with multiple forecasts, using bootstrapped residuals.

Parameters

df_forecast (pandas.DataFrame) – forecasted time series
n_sims (int) – Number of bootstrapped samples for prediction interval
n_cum (int) – Used for widening prediction interval. Interval widens every n_sims samples. This parameters is deprecated - use widening_freq instead
pi_q1 (int) – Percentile for outer prediction interval (defaults to 5%-95%)
pi_q2 (int) – Percentile for inner prediction interval (defaults to 20%-80%)
widening_freq (str) – Specifies the frequency with which the prediction interval widens: Y (yearly), M (monthly), W (weekly), D (daily).

Returns

Forecast time series table with added columns:
- q5: 5% percentile of prediction interval
- q5: 20% percentile of prediction interval
- q5: 80 percentile of prediction interval
- q5: 95% percentile of prediction interval

Return type

pandas.DataFrame

Based on https://otexts.org/fpp2/prediction-intervals.html

forecast_models¶

Defines the ForecastModel class, which encapsulates model functions used in forecast model fitting, as well as their number of parameters and initialisation parameters.

class anticipy.forecast_models.ForecastModel(name, n_params, f_model, f_init_params=None, f_bounds=None, l_f_validate_input=None, l_cache_vars=None, dict_f_cache=None)¶

Class that encapsulates model functions for use in forecasting, as well as their number of parameters and functions for parameter initialisation.

A ForecastModel instance is initialized with a model name, a number of model parameters, and a model function. Class instances are callable - when called as a function, their internal model function is used. The main purpose of ForecastModel objects is to generate predicted values for a time series, given a set of parameters. These values can be compared to the original series to get an array of residuals:

y_predicted = model(a_x, a_date, params)
residuals = (a_y - y_predicted)

This is used in an optimization loop to obtain the optimal parameters for the model.

The reason for using this class instead of raw model functions is that ForecastModel supports function composition:

model_sum = fcast_model1 + fcast_model2
# fcast_model 1 and 2 are ForecastModel instances, and so is model_sum
a_y1 = fcast_model1(
    a_x, a_date, params1) + fcast_model2(a_x, a_date, params2)
params = np.concatenate([params1, params2])
a_y2 = model_sum(a_x, a_date, params)
a_y1 == a_y2  # True

Forecast models can be added or multiplied, with the + and * operators. Multiple levels of composition are supported:

model = (model1 + model2) * model3

Model composition is used to aggregate trend and seasonality model components, among other uses.

Model functions have the following signature:

f(a_x, a_date, params, is_mult)
a_x : array of floats
a_date: array of dates, same length as a_x. Only required for date-aware models, e.g. for weekly seasonality.
params: array of floats - model parameters - the optimisation loop updates this to fit our actual values. Each model function uses a fixed number of parameters.
is_mult: boolean. True if the model is being used with multiplicative composition. Required because some model functions (e.g. steps) have different behaviour when added to other models than when multiplying them.
returns an array of floats - with same length as a_x - output of the model defined by this object’s modelling function f_model and the current set of parameters

By default, model parameters are initialized as random values between 0 and 1. It is possible to define a parameter initialization function that picks initial values based on the original time series. This is passed during ForecastModel creation with the argument f_init_params. Parameter initialization is compatible with model composition: the initialization function of each component will be used for that component’s parameters.

Parameter initialisation functions have the following signature:

f_init_params(a_x, a_y, is_mult)
a_x: array of floats - same length as time series
a_y: array of floats - time series values
returns an array of floats - with length equal to this object’s n_params value

By default, model parameters have no boundaries. However, it is possible to define a boundary function for a model, that sets boundaries for each model parameter, based on the input time series. This is passed during ForecastModel creation with the argument f_bounds. Boundary definition is compatible with model composition: the boundary function of each component will be used for that component’s parameters.

Boundary functions have the following signature:

f_bounds(a_x, a_y, a_date)
a_x: array of floats - same length as time series
a_y: array of floats - time series values
a_date: array of dates, same length as a_x. Only required for date-aware models, e.g. for weekly seasonality.
returns a tuple of 2 arrays of floats. The first defines minimum parameter boundaries, and the second the maximum parameter boundaries.

As an option, we can assign a list of input validation functions to a model. These functions analyse the inputs that will be used for fitting a model, returning True if valid, and False otherwise. The forecast logic will skip a model from fitting if any of the validation functions for that model returns False.

Input validation functions have the following signature:

f_validate_input(a_x, a_y, a_date)
See the description of model functions above for more details on these parameters.

Our input time series should meet the following constraints:

Minimum required samples depends on number of model parameters
May include null values
May include multiple values per sample
A date array is only required if the model is date-aware

Class Usage:

model_x = ForecastModel(name, n_params, f_model, f_init_params,
l_f_validate_input)
# Get model name
model_name = model_x.name
# Get number of model parameters
n_params = model_x.n_params
# Get parameter initialisation function
f_init_params = model_x.f_init_params
# Get initial parameters
init_params = f_init_params(t_values, y_values)
# Get model fitting function
f_model = model_x.f_model
# Get model output
y = f_model(a_x, a_date, parameters)

The following pre-generated models are available. They are available as attributes from this module: # noqa

Forecast models¶
name	params	formula	notes
model_null	0	y=0	Does nothing. Used to disable components (e.g. seasonality)
model_constant	1	y=A	Constant model
model_linear	2	y=Ax + B	Linear model
model_linear_nondec	2	y=Ax + B	Non decreasing linear model. With boundaries to ensure model slope >=0
model_quasilinear	3	y=A*(x^B) + C	Quasilinear model
model_exp	2	y=A * B^x	Exponential model
model_decay	4	Y = A * e^(B*(x-C)) + D	Exponential decay model
model_step	2	y=0 if x<A, y=B if x>=A	Step model
model_two_steps	4	see model_step	2 step models. Parameter initialization is aware of # of steps.
model_sigmoid_step	3	y = A + (B - A) / (1 + np.exp(- D * (x - C)))	Sigmoid step model
model_sigmoid	3	y = A + (B - A) / (1 + np.exp(- D * (x - C)))	Sigmoid model
model_season_wday	7	see desc.	Weekday seasonality model. Assigns a constant value to each weekday
model_season_wday	6	see desc.	6-param weekday seasonality model. As above, with one constant set to 0.
model_season_wday_2	2	see desc.	Weekend seasonality model. Assigns a constant to each of weekday/weekend
model_season_month	12	see desc.	Month seasonality model. Assigns a constant value to each month
model_season_fourier_yearly	10	see desc	Fourier yearly seasonality model

anticipy.forecast_models.get_model_outliers(df, window=3)¶

Identify outlier samples in a time series

Parameters

df (pandas.DataFrame) – Input time series
window (int) – The x-axis window to aggregate multiple steps/spikes

Returns

tuple (mask_step, mask_spike)
mask_step: True if sample contains a step
mask_spike: True if sample contains a spike

Return type

tuple of 2 numpy arrays of booleans

TODO: require minimum number of samples to find an outlier

anticipy.forecast_models.get_model_dummy(name, dummy, **kwargs)¶

Generate a model based on a dummy variable.

Parameters

name (basestring) – Name of the model
dummy (function, or list-like of numerics or datetime-likes) –

Can be a function or a list-like.

If a function, it must be of the form f_dummy(a_x, a_date),

and return a numpy array of floats

with the same length as a_x and values that are either 0 or 1.

If a list-like of numerics, it will be converted to a f_dummy function

as described above, which will have values of 1 when a_x has one of

the values in the list, and 0 otherwise. If a list-like of date-likes,

it will be converted to a f_dummy function as described above, which

will have values of 1 when a_date has one of the values in the list,

and 0 otherwise.
kwargs –

Returns

A model that returns A when dummy is 1, and 0 (or 1 if is_mult==True)

otherwise.

Return type

ForecastModel

class anticipy.forecast_models.CalendarBankHolUK(name=None, rules=None)¶

class anticipy.forecast_models.CalendarChristmasUK(name=None, rules=None)¶

class anticipy.forecast_models.CalendarBankHolIta(name=None, rules=None)¶

class anticipy.forecast_models.CalendarChristmasIta(name=None, rules=None)¶

anticipy.forecast_models.get_model_from_calendars(l_calendar, name=None)¶

Create a ForecastModel based on a list of pandas Calendars.

Parameters: calendar (pandas.tseries.AbstractHolidayCalendar) –
Returns: model based on the input calendar
Return type: ForecastModel

In pandas, Holidays and calendars provide a simple way to define holiday rules, to be used in any analysis that requires a predefined set of holidays. This function converts a Calendar object into a ForecastModel that assigns a parameter to each calendar rule.

As an example, a Calendar with 1 rule defining Christmas dates generates a model with a single parameter, which determines the amount added/multiplied to samples falling on Christmas. A calendar with 2 rules for Christmas and New Year will have two parameters - the first one applying to samples in Christmas, and the second one applying to samples in New Year.

Usage:

from pandas.tseries.holiday import USFederalHolidayCalendar
model_calendar = get_model_from_calendar(USFederalHolidayCalendar())

anticipy.forecast_models.get_model_from_datelist(name=None, *args)¶

Create a ForecastModel based on one or more lists of dates.

Parameters

name (str) – Model name
args – Each element in args is a list of dates.

Returns

model based on the input lists of dates

Return type

ForecastModel

Usage:

model_datelist1=get_model_from_date_list('datelist1',
                                         [date1, date2, date3])
model_datelists23 = get_model_from_date_list('datelists23',
                                        [date1, date2], [date3, date4])

In the example above, model_datelist1 will have one parameter, which determines the amount added/multiplied to samples with dates matching either date1, date2 or date3. model_datelists23 will have two parameters - the first one applying to samples in date1 and date2, and the second one applying to samples in date 3 and date4

anticipy.forecast_models.fix_params_fmodel(forecast_model, l_params_fixed)¶

Given a forecast model and a list of floats, modify the model so that some of its parameters become fixed

Parameters

forecast_model (ForecastModel) – Input model
l_params_fixed (list) – List of floats with same length as number of parameters in model. For each element, a non-null value means that the parameter in that position is fixed to that value. A null value means that the parameter in that position is not fixed.

Returns

A forecast model with a number of parameters equal to the number of null values in l_params_fixed, with f_model modified so that some of its parameters gain fixed values equal to the non-null values in l_params

Return type

ForecastModel

anticipy.forecast_models.simplify_model(f_model, a_x=None, a_y=None, a_date=None)¶

Check a model’s bounds, and update model to make parameters fixed if their min and max bounds are equal

Parameters

f_model (ForecastModel) – Input model
a_x (numpy array of floats) – X axis for model function.
a_y (numpy array of floats) – Input time series values, to compare to the model function
a_date (numpy array of datetimes) – Dates for the input time series

Returns

Model with simplified parameters based on bounds

Return type

ForecastModel

anticipy.forecast_models.get_l_model_auto_season(a_date, min_periods=1.5, season_add_mult='add', l_season_yearly=None, l_season_weekly=None)¶

Generates a list of candidate seasonality models for an series of timestamps

Parameters

a_date (numpy array of timestamps) – date array of a time series
min_periods (float) – Minimum number of periods required to apply seasonality
season_add_mult – ‘add’ or ‘mult’

Returns

list of candidate seasonality models

Return type

list of ForecastModel

model_utils¶

Utility functions for model generation

anticipy.model_utils.array_transpose(a)¶

Transpose a 1-D numpy array

Parameters: a (numpy.Array) – An array with shape (n,)
Returns: The original array, with shape (n,1)
Return type: numpy.Array

anticipy.model_utils.model_requires_scaling(model)¶

Given a anticipy.forecast_models.ForecastModel: return True if the function requires scaling a_x

Parameters: model (function) – A get_model_<modeltype> function from anticipy.model.periodic_models or anticipy.model.aperiodic_models
Returns: True if function is logistic or sigmoidal
Return type: bool

anticipy.model_utils.apply_a_x_scaling(a_x, model=None, scaling_factor=100.0)¶

Modify a_x for forecast_models that require it

Parameters

a_x (numpy array) – x axis of time series
model (function or None) – a anticipy.forecast_models.ForecastModel
scaling_factor (float) – Value used for scaling t_values for logistic models

Returns

a_x with scaling applied, if required

Return type

numpy array

anticipy.model_utils.get_normalized_x_from_date(s_date)¶: Get column of days since Monday of first date

anticipy.model_utils.get_s_x_extrapolate(date_start_actuals, date_end_actuals, model=None, freq=None, extrapolate_years=2.5, scaling_factor=100.0, x_start_actuals=0.0)¶

Return a_x series with DateTimeIndex, covering the date range for the: actuals, plus a forecast period.

Parameters

date_start_actuals (str, datetime, int or float) – date or numeric index for first actuals sample
date_end_actuals (str, datetime, int or float) – date or numeric index for last actuals sample
extrapolate_years (float) –
model (function) –
freq (basestring) – Time unit between samples. Supported units are ‘W’ for weekly samples, or ‘D’ for daily samples. (untested) Any date unit or time unit accepted by numpy should also work, see https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.datetime.html#arrays-dtypes-dateunits # noqa
shifted_origin (int) – Offset to apply to a_x
scaling_factor (float) – Value used for scaling a_x for certain model functions
x_start_actuals (int) – numeric index for the first actuals sample

Returns

Series of floats with DateTimeIndex. To be used as (a_date, a_x) input for a model function.

Return type

pandas.Series

The returned series covers the actuals time domain plus a forecast period lasting extrapolate_years, in years. The number of additional samples for the forecast period is time_resolution * extrapolate_years, rounded down

anticipy.model_utils.get_aic_c(fit_error, n, n_params)¶

This function implements the corrected Akaike Information Criterion (AICc) taking as input a given fit error and data/model degrees of freedom. We assume that the residuals of the candidate model are distributed according to independent identical normal distributions with zero mean. Hence, we can use define the AICc as

\[AICc = AIC + \frac{2k(k+1)}{n-k-1} = 2k + n \log\left(\frac{E}{n}\right) + \frac{2k(k+1)}{n-k-1},\]

where \(k\) and \(n\) denotes the model and data degrees of freedom respectively, and \(E\) denotes the residual error of the fit.

Parameters

fit_error (float) – Residual error of the fit
n (int) – Data degrees of freedom
n_params (int) – Model degrees of freedom

Returns

Corrected Akaike Information Criterion (AICc)

Return type

float

Note:

see AIC in Wikipedia article on the AIC.

anticipy.model_utils.is_multiplicative(df, freq='M')¶

For an input time series, check if model composition should be multiplicative.

Return True if multiplicative is best - otherwise, use additive composition.

We assume multiplicative composition is best if variance correlates heavily (>0.8) with mean. We aggregate data on a monthly basis by default for this analysis. Use

The following exceptions apply:

If any time series value is <=0, use additive
If date information is unavailable (only x column), use additive
If less than 2 periods worth of data are available, use additive

forecast_plot¶

Functions to plot forecast outputs

anticipy.forecast_plot.plot_forecast(df_fcast, output='html', path=None, width=None, height=None, title=None, dpi=70, show_legend=True, auto_open=False, include_interval=False, pi_q1=5, pi_q2=20)¶

Generates matplotlib or plotly plot and saves it respectively as png or html

Parameters

df_fcast (pandas.DataFrame) –

Forecast Dataframe with the following columns:

- date (timestamp)

- model (str) : ID for the forecast model

- y (float) : Value of the time series in that sample

- is_actuals (bool) : True for actuals samples, False for forecast
output (basestring) – Indicates the output type (html=Default, png or jupyter)
path (basestring) – File path for output
width (int) – Image width, in pixels
height (int) – Image height, in pixels
title (basestring) – Plot title
dpi (int) – Image dpi
show_legend (bool) – Indicates whether legends will be displayed
auto_open (bool) – Indicates whether the output will be displayed automatically
pi_q1 (int) – Percentile for outer prediction interval (defaults to 5%-95%)
pi_q2 (int) – Percentile for inner prediction interval (defaults to 20%-80%)

Returns

Success or failure code.

Return type

int