Library Reference¶
forecast¶
Functions to run forecast
- anticipy.forecast.get_residuals(params, model, a_x, a_y, a_date, a_weights=None, df_actuals=None, **kwargs)¶
Given a time series, a model function and a set of parameters, get the residuals
- Parameters
params (numpy array of floats) – parameters for model function
model (function or ForecastModel instance) – model function. Usage: model(a_x, a_date, params)
a_x (numpy array of floats) – X axis for model function.
a_y (numpy array of floats) – Input time series values, to compare to the model function
a_date (numpy array of datetimes) – Dates for the input time series
a_weights (numpy array of floats) – weights for each individual sample
df_actuals (pandas DataFrame) – The original dataframe with actuals data. Not required for regression but used by naive models
- Returns
array with residuals, same length as a_x, a_y
- Return type
numpy array of floats
- anticipy.forecast.optimize_least_squares(model, a_x, a_y, a_date, a_weights=None, df_actuals=None, use_cache=True)¶
Given a time series and a model function, find the set of parameters that minimises residuals
- Parameters
model (function) – model function, to be fitted against the actuals
a_x (numpy array of floats) – X axis for model function.
a_y (numpy array of floats) – Input time series values, to compare to the model function
a_date (numpy array of datetimes) – Dates for the input time series
a_weights (numpy array of floats) – weights for each individual sample
df_actuals (pandas DataFrame) – The original dataframe with actuals data. Not required for regression but used by naive models
use_cache (bool) – If true, save some model variables to cache when fitting
- Returns
- table(success, params, cost, optimality,iterations, status, jac_evals, message):- success (bool): True if successful fit- params (list): Parameters of fitted model- cost (float): Value of cost function- optimality(float)- iterations (int) : Number of function evaluations- status (int) : Status code- jac_evals(int) : Number of Jacobian evaluations- message (str) : Output message
- Return type
pandas.DataFrame
- anticipy.forecast.normalize_df(df_y, col_name_y='y', col_name_weight='weight', col_name_x='x', col_name_date='date', col_name_source='source')¶
Converts an input dataframe for run_forecast() into a normalized format suitable for fit_model()
- Parameters
df_y (pandas.DataFrame) – unformatted input dataframe, for use by run_forecast()
col_name_y (basestring) – name for column with time series values
col_name_weight (basestring) – name for column with time series weights
col_name_x (basestring) – name for column with time series indices
col_name_date (basestring) – name for column with time series dates
col_name_source (basestring) – name for column with time series source identifiers
- Returns
formatted input dataframe, for use by run_forecast()
- Return type
pandas.DataFrame
- anticipy.forecast.fit_model(model, df_y, freq='W', source='test', df_actuals=None, use_cache=True)¶
Given a time series and a model, optimize model parameters and return
- Parameters
model (function or ForecastModel instance) – model function. Usage: model(a_x, a_date, params)
df_y (pandas.DataFrame) –
Dataframe with the following columns:- y:- date: (optional)- weight: (optional)- x: (optional)source (basestring) – source identifier for this time series
freq (basestring) – ‘W’ or ‘D’ . Used only for metadata
df_actuals (pandas DataFrame) – The original dataframe with actuals data. Not required for regression but used by naive models
use_cache (bool) – If true, save some model variables to cache when fitting
- Returns
table (source, model_name, y_weights , freq, is_fit, aic_c, params)
- Return type
pandas.DataFrame
This function calls optimize_least_squares() to perform the optimization loop. It performs some cleaning up of input and output parameters.
- anticipy.forecast.extrapolate_model(model, params, date_start_actuals, date_end_actuals, freq='W', extrapolate_years=2.0, x_start_actuals=0.0, df_actuals=None)¶
Given a model and a set of parameters, generate model output for a date range plus a number of additional years.
- Parameters
model (function or ForecastModel instance) – model function. Usage: model(a_x, a_date, params)
params (numpy array of floats) – parameters for model function
date_start_actuals (str, datetime, int or float) – date or numeric index for first actuals sample
date_end_actuals (str, datetime, int or float) – date or numeric index for last actuals sample
freq (basestring) – Time unit between samples. Supported units are ‘W’ for weekly samples, or ‘D’ for daily samples. (untested) Any date unit or time unit accepted by numpy should also work, see https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.datetime.html#arrays-dtypes-dateunits # noqa
extrapolate_years (float) – Number of years (or fraction of year) covered by the generated time series, after the end of the actuals
x_start_actuals –
df_actuals (pandas DataFrame) – The original dataframe with actuals data. Not required for regression but used by naive models
- Returns
dataframe with a time series extrapolated from the model function
- Return type
pandas.DataFrame, with an ‘y’ column of floats
- anticipy.forecast.get_list_model(l_model_trend, l_model_season, season_add_mult='both')¶
Generate a list of composite models from lists of trend and seasonality models
- Parameters
l_model_trend (list of ForecastModel) – list of trend models
l_model_season (list of ForecastModel) – list of seasonality models
season_add_mult (basestring) – ‘mult’, ‘add’ or ‘both’, for multiplicative/additive composition (or both types)
- Returns
- Return type
list of ForecastModel
All combinations of possible composite models are included
- anticipy.forecast.get_df_actuals_clean(df_actuals, source, source_long)¶
Convert an actuals dataframe to a clean format
- Parameters
df_actuals (pandas.DataFrame) – dataframe in normalized format, with columns y and optionally x, date, weight
source (basestring) – source identifier for this time series
source_long (basestring) – long-format source identifier for this time series
- Returns
clean actuals dataframe
- Return type
pandas.DataFrame
- anticipy.forecast.run_forecast(df_y, l_model_trend=None, l_model_season=None, date_start_actuals=None, source_id='src', col_name_y='y', col_name_weight='weight', col_name_x='x', col_name_date='date', col_name_source='source', extrapolate_years=0, season_add_mult='add', include_all_fits=False, simplify_output=True, find_outliers=False, l_season_yearly=None, l_season_weekly=None, verbose=None, l_model_naive=None, l_model_calendar=None, n_cum=None, pi_q1=5, pi_q2=20, pi_widening_freq='Y', use_cache=True)¶
Generate forecast for one or more input time series
- Parameters
df_y (pandas.DataFrame) –
input dataframe with the following columns:- Mandatory: a value column, with the time series values- Optional: weight column, source ID column, index column, datecolumnl_model_trend (list of ForecastModel) – list of trend models
l_model_season (list of ForecastModel) – list of seasonality models
date_start_actuals (str, datetime, int or float) – date or numeric index for first actuals sample to be used for forecast. Previous samples are ignored
source_id (basestring) – source identifier for time series, if source column is missing
col_name_y (basestring) – name for column with time series values
col_name_weight (basestring) – name for column with time series weights
col_name_x (basestring) – name for column with time series indices
col_name_date (basestring) – name for column with time series dates
col_name_source (basestring) – name for column with time series source identifiers
extrapolate_years (float) – Number of years (or fraction of year) covered by the forecast, after the end of the actuals
season_add_mult (str) – ‘add’, ‘mult’, or ‘both’. Whether forecast seasonality will be additive, multiplicative, or the best fit of the two.
find_outliers (bool) – If True, find outliers in input data, ignore outlier samples in forecast
include_all_fits (bool) – If True, also include non-optimal models in output
simplify_output (bool) – If False, return dict with forecast and metadata. Otherwise, return only forecast.
l_season_yearly (list of ForecastModel) – yearly seasonality models to consider in automatic seasonality detection
l_season_weekly (list of ForecastModel) – yearly seasonality models to consider in automatic seasonality detection
verbose (bool) – If True, enable verbose logging
l_model_naive (list of ForecastModel) – list of naive models to consider for forecast. Naive models are not fitted with regression, they are based on the last actuals samples
l_model_calendar (list of ForecastModel) – list of calendar models to consider for forecast, to handle holidays and calendar-based events
n_cum (int) – Used for widening prediction interval. Interval widens every n_sims samples.
pi_q1 (int) – Percentile for outer prediction interval (defaults to 5%-95%)
pi_q2 (int) – Percentile for inner prediction interval (defaults to 20%-80%)
pi_widening_freq (str) – Specifies the frequency with which the prediction interval widens: Y (yearly), M (monthly), W (weekly), D (daily). This parameters is deprecated - use pi_widening_freq instead
use_cache (bool) – If true, save some model variables to cache when fitting
- Returns
- With simplify_output=False, returns a dictionary with 4 dataframes:- forecast: output time series with prediction interval- data: output time series. If include_all_fits, includes all fittingmodels- metadata: forecast metadata table- optimize_info: debugging metadata from scipy.optimizeWith simplify_output=True, returns the ‘forecast’ dataframe,as described above
- Return type
pandas.DataFrame or dict of pandas.DataFrames
- anticipy.forecast.aggregate_forecast_dict_results(l_dict_result)¶
Aggregates a list of dictionaries with forecast outputs into a single dictionary
- Parameters
l_dict_result (list of dictionaries) – list with outputs dictionaries from run_forecast_single
- Returns
aggregated dictionary
- Return type
dict
- anticipy.forecast.run_forecast_single(df_y, l_model_trend=None, l_model_season=None, date_start_actuals=None, source_id='src', extrapolate_years=0, season_add_mult='add', include_all_fits=False, simplify_output=True, find_outliers=False, l_season_yearly=None, l_season_weekly=None, l_model_naive=None, l_model_calendar=None, n_cum=1, pi_q1=5, pi_q2=20, pi_widening_freq=None, use_cache=True)¶
Generate forecast for one input time series
- Parameters
df_y (pandas.DataFrame) –
input dataframe with the following columns:- y: time series values- x: time series indices- weight: time series weights (optional)- date: time series dates (optional)l_model_trend (list of ForecastModel) – list of trend models
l_model_season (list of ForecastModel) – list of seasonality models
date_start_actuals (str, datetime, int or float) – date or numeric index for first actuals sample to be used for forecast. Previous samples are ignored
source_id (basestring) – source identifier for time series
extrapolate_years (float) –
season_add_mult (str) – ‘add’, ‘mult’, or ‘both’. Whether forecast seasonality will be additive, multiplicative, or the best fit of the two.
include_all_fits (bool) – If True, also include non-optimal models in output
simplify_output (bool) – If False, return dict with forecast and metadata. Otherwise, return only forecast.
find_outliers (bool) – If True, find outliers in input data, ignore outlier samples in forecast
l_season_yearly (list of ForecastModel) – yearly seasonality models to consider in automatic seasonality detection
l_season_weekly (list of ForecastModel) – yearly seasonality models to consider in automatic seasonality detection
l_model_naive (list of ForecastModel) – list of naive models to consider for forecast. Naive models are not fitted with regression, they are based on the last actuals samples
l_model_calendar (list of ForecastModel) – list of calendar models to consider for forecast, to handle holidays and calendar-based events
n_cum (int) – Used for widening prediction interval. Interval widens every n_sims samples. This parameters is deprecated - use pi_widening_freq instead
pi_q1 (int) – Percentile for outer prediction interval (defaults to 5%-95%)
pi_q2 (int) – Percentile for inner prediction interval (defaults to 20%-80%)
widening_freq – Specifies the frequency with which the prediction interval widens: Y (yearly), M (monthly), W (weekly), D (daily).
use_cache (bool) – If true, save some model variables to cache when fitting
- Returns
- With simplify_output=False, returns a dictionary with 4 dataframes:- forecast: output time series with prediction interval- data: output time series. If include_all_fits, includes all fittingmodels- metadata: forecast metadata table- optimize_info: debugging metadata from scipy.optimizeWith simplify_output=True, returns the ‘forecast’ dataframe,as described above
- Return type
pandas.DataFrame or dict of pandas.DataFrames
- anticipy.forecast.run_l_forecast(l_fcast_input, col_name_y='y', col_name_weight='weight', col_name_x='x', col_name_date='date', col_name_source='source', extrapolate_years=0, season_add_mult='add', include_all_fits=False, find_outliers=False, use_cache=True)¶
Generate forecasts for a list of SolverConfig objects, each including a time series, model functions, and other configuration parameters.
- Parameters
l_fcast_input (list of ForecastInput) – List of forecast input configurations. Each element includes a time series, candidate forecast models for trend and seasonality, and other configuration parameters. For each input configuration, a forecast time series will be generated.
return_all_models (bool) –
If True, result includes non-fitting models, with null AIC and anempty forecast df. Otherwise, result includes only fitting models,and for time series where no fitting model is available, a’no-best-model’ entry with null AIC and an empty forecastdf is added.return_all_fits (bool) – If True, result includes all models for each nput time series. Otherwise, only the best model is included.
extrapolate_years (float) –
season_add_mult (str) – ‘add’, ‘mult’, or ‘both’. Whether forecast seasonality will be additive, multiplicative, or the best fit of the two.
fill_gaps_y_values (bool) – If True, gaps in time series will be filled with NaN values
freq (str) – ‘W’ or ‘D’ . Sampling frequency of the output forecast: weekly or daily.
use_cache (bool) – If true, save some model variables to cache when fitting
- Returns
- dict(data,metadata)data: dataframe(date, source, model, y)metadata: dataframe(‘source’, ‘model’, ‘res_weights’, ‘freq’,’is_fit’, ‘cost’, ‘aic_c’, ‘params’, ‘status’)
- Return type
dict
- class anticipy.forecast.ForecastInput(source_id, df_y, l_model_trend=None, l_model_season=None, weights_y_values=1.0, date_start_actuals=None)¶
Class that encapsulates input variables for forecast.run_forecast()
- anticipy.forecast.get_pi(df_forecast, n_sims=100, n_cum=None, pi_q1=5, pi_q2=20, widening_freq='Y')¶
Generate prediction intervals for a table with multiple forecasts, using bootstrapped residuals.
- Parameters
df_forecast (pandas.DataFrame) – forecasted time series
n_sims (int) – Number of bootstrapped samples for prediction interval
n_cum (int) – Used for widening prediction interval. Interval widens every n_sims samples. This parameters is deprecated - use widening_freq instead
pi_q1 (int) – Percentile for outer prediction interval (defaults to 5%-95%)
pi_q2 (int) – Percentile for inner prediction interval (defaults to 20%-80%)
widening_freq (str) – Specifies the frequency with which the prediction interval widens: Y (yearly), M (monthly), W (weekly), D (daily).
- Returns
- Forecast time series table with added columns:- q5: 5% percentile of prediction interval- q5: 20% percentile of prediction interval- q5: 80 percentile of prediction interval- q5: 95% percentile of prediction interval
- Return type
pandas.DataFrame
forecast_models¶
Defines the ForecastModel class, which encapsulates model functions used in forecast model fitting, as well as their number of parameters and initialisation parameters.
- class anticipy.forecast_models.ForecastModel(name, n_params, f_model, f_init_params=None, f_bounds=None, l_f_validate_input=None, l_cache_vars=None, dict_f_cache=None)¶
Class that encapsulates model functions for use in forecasting, as well as their number of parameters and functions for parameter initialisation.
A ForecastModel instance is initialized with a model name, a number of model parameters, and a model function. Class instances are callable - when called as a function, their internal model function is used. The main purpose of ForecastModel objects is to generate predicted values for a time series, given a set of parameters. These values can be compared to the original series to get an array of residuals:
y_predicted = model(a_x, a_date, params) residuals = (a_y - y_predicted)
This is used in an optimization loop to obtain the optimal parameters for the model.
The reason for using this class instead of raw model functions is that ForecastModel supports function composition:
model_sum = fcast_model1 + fcast_model2 # fcast_model 1 and 2 are ForecastModel instances, and so is model_sum a_y1 = fcast_model1( a_x, a_date, params1) + fcast_model2(a_x, a_date, params2) params = np.concatenate([params1, params2]) a_y2 = model_sum(a_x, a_date, params) a_y1 == a_y2 # True
Forecast models can be added or multiplied, with the + and * operators. Multiple levels of composition are supported:
model = (model1 + model2) * model3
Model composition is used to aggregate trend and seasonality model components, among other uses.
Model functions have the following signature:
f(a_x, a_date, params, is_mult)
a_x : array of floats
a_date: array of dates, same length as a_x. Only required for date-aware models, e.g. for weekly seasonality.
params: array of floats - model parameters - the optimisation loop updates this to fit our actual values. Each model function uses a fixed number of parameters.
is_mult: boolean. True if the model is being used with multiplicative composition. Required because some model functions (e.g. steps) have different behaviour when added to other models than when multiplying them.
returns an array of floats - with same length as a_x - output of the model defined by this object’s modelling function f_model and the current set of parameters
By default, model parameters are initialized as random values between 0 and 1. It is possible to define a parameter initialization function that picks initial values based on the original time series. This is passed during ForecastModel creation with the argument f_init_params. Parameter initialization is compatible with model composition: the initialization function of each component will be used for that component’s parameters.
Parameter initialisation functions have the following signature:
f_init_params(a_x, a_y, is_mult)
a_x: array of floats - same length as time series
a_y: array of floats - time series values
returns an array of floats - with length equal to this object’s n_params value
By default, model parameters have no boundaries. However, it is possible to define a boundary function for a model, that sets boundaries for each model parameter, based on the input time series. This is passed during ForecastModel creation with the argument f_bounds. Boundary definition is compatible with model composition: the boundary function of each component will be used for that component’s parameters.
Boundary functions have the following signature:
f_bounds(a_x, a_y, a_date)
a_x: array of floats - same length as time series
a_y: array of floats - time series values
a_date: array of dates, same length as a_x. Only required for date-aware models, e.g. for weekly seasonality.
returns a tuple of 2 arrays of floats. The first defines minimum parameter boundaries, and the second the maximum parameter boundaries.
As an option, we can assign a list of input validation functions to a model. These functions analyse the inputs that will be used for fitting a model, returning True if valid, and False otherwise. The forecast logic will skip a model from fitting if any of the validation functions for that model returns False.
Input validation functions have the following signature:
f_validate_input(a_x, a_y, a_date)
See the description of model functions above for more details on these parameters.
Our input time series should meet the following constraints:
Minimum required samples depends on number of model parameters
May include null values
May include multiple values per sample
A date array is only required if the model is date-aware
Class Usage:
model_x = ForecastModel(name, n_params, f_model, f_init_params, l_f_validate_input) # Get model name model_name = model_x.name # Get number of model parameters n_params = model_x.n_params # Get parameter initialisation function f_init_params = model_x.f_init_params # Get initial parameters init_params = f_init_params(t_values, y_values) # Get model fitting function f_model = model_x.f_model # Get model output y = f_model(a_x, a_date, parameters)
The following pre-generated models are available. They are available as attributes from this module: # noqa
¶ name
params
formula
notes
model_null
0
y=0
Does nothing. Used to disable components (e.g. seasonality)
model_constant
1
y=A
Constant model
model_linear
2
y=Ax + B
Linear model
model_linear_nondec
2
y=Ax + B
Non decreasing linear model. With boundaries to ensure model slope >=0
model_quasilinear
3
y=A*(x^B) + C
Quasilinear model
model_exp
2
y=A * B^x
Exponential model
model_decay
4
Y = A * e^(B*(x-C)) + D
Exponential decay model
model_step
2
y=0 if x<A, y=B if x>=A
Step model
model_two_steps
4
see model_step
2 step models. Parameter initialization is aware of # of steps.
model_sigmoid_step
3
y = A + (B - A) / (1 + np.exp(- D * (x - C)))
Sigmoid step model
model_sigmoid
3
y = A + (B - A) / (1 + np.exp(- D * (x - C)))
Sigmoid model
model_season_wday
7
see desc.
Weekday seasonality model. Assigns a constant value to each weekday
model_season_wday
6
see desc.
6-param weekday seasonality model. As above, with one constant set to 0.
model_season_wday_2
2
see desc.
Weekend seasonality model. Assigns a constant to each of weekday/weekend
model_season_month
12
see desc.
Month seasonality model. Assigns a constant value to each month
model_season_fourier_yearly
10
see desc
Fourier yearly seasonality model
- anticipy.forecast_models.get_model_outliers(df, window=3)¶
Identify outlier samples in a time series
- Parameters
df (pandas.DataFrame) – Input time series
window (int) – The x-axis window to aggregate multiple steps/spikes
- Returns
- tuple (mask_step, mask_spike)mask_step: True if sample contains a stepmask_spike: True if sample contains a spike
- Return type
tuple of 2 numpy arrays of booleans
TODO: require minimum number of samples to find an outlier
- anticipy.forecast_models.get_model_dummy(name, dummy, **kwargs)¶
Generate a model based on a dummy variable.
- Parameters
name (basestring) – Name of the model
dummy (function, or list-like of numerics or datetime-likes) –
Can be a function or a list-like.If a function, it must be of the form f_dummy(a_x, a_date),and return a numpy array of floatswith the same length as a_x and values that are either 0 or 1.If a list-like of numerics, it will be converted to a f_dummy functionas described above, which will have values of 1 when a_x has one ofthe values in the list, and 0 otherwise. If a list-like of date-likes,it will be converted to a f_dummy function as described above, whichwill have values of 1 when a_date has one of the values in the list,and 0 otherwise.kwargs –
- Returns
- A model that returns A when dummy is 1, and 0 (or 1 if is_mult==True)otherwise.
- Return type
- class anticipy.forecast_models.CalendarBankHolUK(name=None, rules=None)¶
- class anticipy.forecast_models.CalendarChristmasUK(name=None, rules=None)¶
- class anticipy.forecast_models.CalendarBankHolIta(name=None, rules=None)¶
- class anticipy.forecast_models.CalendarChristmasIta(name=None, rules=None)¶
- anticipy.forecast_models.get_model_from_calendars(l_calendar, name=None)¶
Create a ForecastModel based on a list of pandas Calendars.
- Parameters
calendar (pandas.tseries.AbstractHolidayCalendar) –
- Returns
model based on the input calendar
- Return type
In pandas, Holidays and calendars provide a simple way to define holiday rules, to be used in any analysis that requires a predefined set of holidays. This function converts a Calendar object into a ForecastModel that assigns a parameter to each calendar rule.
As an example, a Calendar with 1 rule defining Christmas dates generates a model with a single parameter, which determines the amount added/multiplied to samples falling on Christmas. A calendar with 2 rules for Christmas and New Year will have two parameters - the first one applying to samples in Christmas, and the second one applying to samples in New Year.
Usage:
from pandas.tseries.holiday import USFederalHolidayCalendar model_calendar = get_model_from_calendar(USFederalHolidayCalendar())
- anticipy.forecast_models.get_model_from_datelist(name=None, *args)¶
Create a ForecastModel based on one or more lists of dates.
- Parameters
name (str) – Model name
args – Each element in args is a list of dates.
- Returns
model based on the input lists of dates
- Return type
Usage:
model_datelist1=get_model_from_date_list('datelist1', [date1, date2, date3]) model_datelists23 = get_model_from_date_list('datelists23', [date1, date2], [date3, date4])
In the example above, model_datelist1 will have one parameter, which determines the amount added/multiplied to samples with dates matching either date1, date2 or date3. model_datelists23 will have two parameters - the first one applying to samples in date1 and date2, and the second one applying to samples in date 3 and date4
- anticipy.forecast_models.fix_params_fmodel(forecast_model, l_params_fixed)¶
Given a forecast model and a list of floats, modify the model so that some of its parameters become fixed
- Parameters
forecast_model (ForecastModel) – Input model
l_params_fixed (list) – List of floats with same length as number of parameters in model. For each element, a non-null value means that the parameter in that position is fixed to that value. A null value means that the parameter in that position is not fixed.
- Returns
A forecast model with a number of parameters equal to the number of null values in l_params_fixed, with f_model modified so that some of its parameters gain fixed values equal to the non-null values in l_params
- Return type
- anticipy.forecast_models.simplify_model(f_model, a_x=None, a_y=None, a_date=None)¶
Check a model’s bounds, and update model to make parameters fixed if their min and max bounds are equal
- Parameters
f_model (ForecastModel) – Input model
a_x (numpy array of floats) – X axis for model function.
a_y (numpy array of floats) – Input time series values, to compare to the model function
a_date (numpy array of datetimes) – Dates for the input time series
- Returns
Model with simplified parameters based on bounds
- Return type
- anticipy.forecast_models.get_l_model_auto_season(a_date, min_periods=1.5, season_add_mult='add', l_season_yearly=None, l_season_weekly=None)¶
Generates a list of candidate seasonality models for an series of timestamps
- Parameters
a_date (numpy array of timestamps) – date array of a time series
min_periods (float) – Minimum number of periods required to apply seasonality
season_add_mult – ‘add’ or ‘mult’
- Returns
list of candidate seasonality models
- Return type
list of ForecastModel
model_utils¶
Utility functions for model generation
- anticipy.model_utils.array_transpose(a)¶
Transpose a 1-D numpy array
- Parameters
a (numpy.Array) – An array with shape (n,)
- Returns
The original array, with shape (n,1)
- Return type
numpy.Array
- anticipy.model_utils.model_requires_scaling(model)¶
- Given a
anticipy.forecast_models.ForecastModel
return True if the function requires scaling a_x
- Parameters
model (function) – A get_model_<modeltype> function from
anticipy.model.periodic_models
oranticipy.model.aperiodic_models
- Returns
True if function is logistic or sigmoidal
- Return type
bool
- Given a
- anticipy.model_utils.apply_a_x_scaling(a_x, model=None, scaling_factor=100.0)¶
Modify a_x for forecast_models that require it
- Parameters
a_x (numpy array) – x axis of time series
model (function or None) – a
anticipy.forecast_models.ForecastModel
scaling_factor (float) – Value used for scaling t_values for logistic models
- Returns
a_x with scaling applied, if required
- Return type
numpy array
- anticipy.model_utils.get_normalized_x_from_date(s_date)¶
Get column of days since Monday of first date
- anticipy.model_utils.get_s_x_extrapolate(date_start_actuals, date_end_actuals, model=None, freq=None, extrapolate_years=2.5, scaling_factor=100.0, x_start_actuals=0.0)¶
- Return a_x series with DateTimeIndex, covering the date range for the
actuals, plus a forecast period.
- Parameters
date_start_actuals (str, datetime, int or float) – date or numeric index for first actuals sample
date_end_actuals (str, datetime, int or float) – date or numeric index for last actuals sample
extrapolate_years (float) –
model (function) –
freq (basestring) – Time unit between samples. Supported units are ‘W’ for weekly samples, or ‘D’ for daily samples. (untested) Any date unit or time unit accepted by numpy should also work, see https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.datetime.html#arrays-dtypes-dateunits # noqa
shifted_origin (int) – Offset to apply to a_x
scaling_factor (float) – Value used for scaling a_x for certain model functions
x_start_actuals (int) – numeric index for the first actuals sample
- Returns
Series of floats with DateTimeIndex. To be used as (a_date, a_x) input for a model function.
- Return type
pandas.Series
The returned series covers the actuals time domain plus a forecast period lasting extrapolate_years, in years. The number of additional samples for the forecast period is time_resolution * extrapolate_years, rounded down
- anticipy.model_utils.get_aic_c(fit_error, n, n_params)¶
This function implements the corrected Akaike Information Criterion (AICc) taking as input a given fit error and data/model degrees of freedom. We assume that the residuals of the candidate model are distributed according to independent identical normal distributions with zero mean. Hence, we can use define the AICc as
\[AICc = AIC + \frac{2k(k+1)}{n-k-1} = 2k + n \log\left(\frac{E}{n}\right) + \frac{2k(k+1)}{n-k-1},\]where \(k\) and \(n\) denotes the model and data degrees of freedom respectively, and \(E\) denotes the residual error of the fit.
- Parameters
fit_error (float) – Residual error of the fit
n (int) – Data degrees of freedom
n_params (int) – Model degrees of freedom
- Returns
Corrected Akaike Information Criterion (AICc)
- Return type
float
Note:
see AIC in Wikipedia article on the AIC.
- anticipy.model_utils.is_multiplicative(df, freq='M')¶
For an input time series, check if model composition should be multiplicative.
Return True if multiplicative is best - otherwise, use additive composition.
We assume multiplicative composition is best if variance correlates heavily (>0.8) with mean. We aggregate data on a monthly basis by default for this analysis. Use
The following exceptions apply:
If any time series value is <=0, use additive
If date information is unavailable (only x column), use additive
If less than 2 periods worth of data are available, use additive
forecast_plot¶
Functions to plot forecast outputs
- anticipy.forecast_plot.plot_forecast(df_fcast, output='html', path=None, width=None, height=None, title=None, dpi=70, show_legend=True, auto_open=False, include_interval=False, pi_q1=5, pi_q2=20)¶
Generates matplotlib or plotly plot and saves it respectively as png or html
- Parameters
df_fcast (pandas.DataFrame) –
Forecast Dataframe with the following columns:- date (timestamp)- model (str) : ID for the forecast model- y (float) : Value of the time series in that sample- is_actuals (bool) : True for actuals samples, False for forecastoutput (basestring) – Indicates the output type (html=Default, png or jupyter)
path (basestring) – File path for output
width (int) – Image width, in pixels
height (int) – Image height, in pixels
title (basestring) – Plot title
dpi (int) – Image dpi
show_legend (bool) – Indicates whether legends will be displayed
auto_open (bool) – Indicates whether the output will be displayed automatically
pi_q1 (int) – Percentile for outer prediction interval (defaults to 5%-95%)
pi_q2 (int) – Percentile for inner prediction interval (defaults to 20%-80%)
- Returns
Success or failure code.
- Return type
int