Add missing slots to a time series dataframe. This function is useful to fill missing slots in a time series dataframe. For example, if a time series is associated to a location, this function will add missing slots for each location. Missing slots are filled with the value specified in the ‘fill_value’ parameter. By default, the frequency of the time series is hourly.
Type
Default
Details
df
DataFrame
input dataframe with datetime, entity and value columns - time series format
datetime_col
str
name of the datetime column
entity_col
str
name of the entity column. If a time series is associated to a location, this column will be ‘location_id’
Function to get the indices for the cutoffs of a Time Series DataFrame. The Time Series DataFrame should be orderded by time.
Type
Default
Details
ts_data
DataFrame
Time Series DataFrame
datetime_col
str
Name of the datetime column
n_features
int
Number of features to use for the prediction
n_targets
int
1
Number of target values to predict
step_size
int
1
Step size to use to slide the Time Series DataFrame
Returns
typing.List[tuple]
# build a time series dataframe with 10 hours of data in random orderts_data = pd.DataFrame({'pickup_hour': ['2022-01-01 01:00:00', '2022-01-01 00:00:00', '2022-01-01 03:00:00', '2022-01-01 04:00:00', '2022-01-01 02:00:00', '2022-01-01 05:00:00', '2022-01-01 09:00:00', '2022-01-01 06:00:00', '2022-01-01 07:00:00', '2022-01-01 08:00:00'],'rides': [2, 3, 1, 1, 2, 1, 1, 2, 1, 1]})ts_data
pickup_hour
rides
0
2022-01-01 01:00:00
2
1
2022-01-01 00:00:00
3
2
2022-01-01 03:00:00
1
3
2022-01-01 04:00:00
1
4
2022-01-01 02:00:00
2
5
2022-01-01 05:00:00
1
6
2022-01-01 09:00:00
1
7
2022-01-01 06:00:00
2
8
2022-01-01 07:00:00
1
9
2022-01-01 08:00:00
1
# the time series should be ordered by time, otherwise it will not work and throw a ValueErrorts_data.sort_values(by='pickup_hour', inplace=True, ignore_index=True)cutoff_idxs = get_cutoff_indices_features_and_target(ts_data, datetime_col='pickup_hour', n_features=3, n_targets=2, step_size=1)cutoff_idxs
Slices and transposes data from time-series format into a (features, target) format that we can use to train Supervised ML models.
Type
Default
Details
ts_data
DataFrame
Time Series DataFrame
n_features
int
Number of features to use for the prediction
datetime_col
str
Name of the datetime column
entity_col
str
Name of the entity column, e.g. location_id
value_col
str
Name of the value column
n_targets
int
1
Number of target values to predict
step_size
int
1
Step size to use to slide the Time Series DataFrame
step_name
str
None
Name of the step column
concat_Xy
bool
False
Whether to concat X and y on the same dataframe or not
Returns
DataFrame
# build a time series dataframe with 10 hours of data in random order and a location id column with 1 and 2ts_data = pd.DataFrame({'pickup_hour': ['2022-01-01 01:00:00', '2022-01-01 00:00:00', '2022-01-01 03:00:00', '2022-01-01 04:00:00', '2022-01-01 02:00:00', '2022-01-01 05:00:00', '2022-01-01 09:00:00', '2022-01-01 06:00:00', '2022-01-01 07:00:00', '2022-01-01 08:00:00'],'location_id': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2],'rides': [2, 3, 1, 1, 2, 1, 1, 2, 1, 1]})ts_data