Resample

Aggregation ¶

Bases: BaseModel

close ¶

close: str = Field(default='last')

high ¶

high: str = Field(default='max')

low ¶

low: str = Field(default='min')

open ¶

open: str = Field(default='first')

volume ¶

volume: str = Field(default='sum')

resample ¶

resample(df: DataFrame, tf: str = '1H', agg: Aggregation = Aggregation(), on: str = 'date') -> DataFrame

Resamples a DataFrame over a specified time frequency and applies aggregation functions.

This function resamples the input DataFrame df using the time frequency provided by tf. It then applies the specified aggregation functions on the relevant columns and fills any missing values via forward fill.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	The DataFrame to be resampled. It must contain a time-based column that can be used for resampling.	required
`tf`	`str`	The time frequency for resampling, default is "1H" (one hour). It should be in a format accepted by pandas' `resample()` function, such as "1D" for daily, "1T" for minute, etc.	`'1H'`
`agg`	`Aggregation`	A dictionary defining the aggregation method for each relevant column. The default value aggregates the "open" column by the first value, "high" by the max, "low" by the min, "close" by the last value, and "volume" by the sum.	`Aggregation()`
`on`	`str`	The name of the column to be used for resampling, typically a date or timestamp column. The default is "date".	`'date'`

Returns:

Type	Description
`DataFrame`	A resampled DataFrame with the aggregation applied, missing values dropped, and any remaining missing values forward-filled.

Examples:

import pandas as pd

df = pd.DataFrame({
     "date": pd.date_range(start="2023-01-01", periods=5, freq="1H"),
     "open": [100, 101, 102, 103, 104],
     "high": [110, 111, 112, 113, 114],
     "low": [90, 91, 92, 93, 94],
     "close": [105, 106, 107, 108, 109],
     "volume": [1000, 1500, 1200, 1100, 1400]
 })
 resample(df, tf="2H", on="date")

Source code in src/tradingtoolbox/utils/resample.py

def resample(
    df: pd.DataFrame, tf: str = "1H", agg: Aggregation = Aggregation(), on: str = "date"
) -> pd.DataFrame:
    """
    Resamples a DataFrame over a specified time frequency and applies aggregation functions.

    This function resamples the input DataFrame `df` using the time frequency provided by `tf`.
    It then applies the specified aggregation functions on the relevant columns and fills any
    missing values via forward fill.

    Parameters:
        df: The DataFrame to be resampled. It must contain a time-based column that can be used
            for resampling.
        tf: The time frequency for resampling, default is "1H" (one hour). It should be in a format
            accepted by pandas' `resample()` function, such as "1D" for daily, "1T" for minute, etc.
        agg: A dictionary defining the aggregation method for each relevant column. The default value
            aggregates the "open" column by the first value, "high" by the max, "low" by the min,
            "close" by the last value, and "volume" by the sum.
        on: The name of the column to be used for resampling, typically a date or timestamp column.
            The default is "date".

    Returns:
        A resampled DataFrame with the aggregation applied, missing values dropped, and any remaining missing values forward-filled.

    Examples:

    ```py
    import pandas as pd

    df = pd.DataFrame({
         "date": pd.date_range(start="2023-01-01", periods=5, freq="1H"),
         "open": [100, 101, 102, 103, 104],
         "high": [110, 111, 112, 113, 114],
         "low": [90, 91, 92, 93, 94],
         "close": [105, 106, 107, 108, 109],
         "volume": [1000, 1500, 1200, 1100, 1400]
     })
     resample(df, tf="2H", on="date")
    ```
    """
    return df.resample(tf, on=on).agg(agg).dropna(how="all").fillna(method="ffill")  # type: ignore