remove_duplicate_timestamps() collapses duplicate rows that share the same timestamp into a single row by applying an aggregation function across all data columns. The default time column is "Time (UTC)" and the default aggregation is "mean", but both are configurable.
Basic usage — multiple data columns, mean aggregation
import pandas as pd
from spotforecast2_safe.preprocessing.curate_data import remove_duplicate_timestamps
df = pd.DataFrame(
{
"Time (UTC)" : [
"2026-01-01 00:00:00" ,
"2026-01-01 00:00:00" , # duplicate
"2026-01-01 01:00:00" ,
],
"Load A" : [100.0 , 120.0 , 130.0 ],
"Load B" : [200.0 , 220.0 , 210.0 ],
}
)
clean_df = remove_duplicate_timestamps(df= df)
clean_df
0
2026-01-01 00:00:00
110.0
210.0
1
2026-01-01 01:00:00
130.0
210.0
Both Load A and Load B are averaged for the duplicate 00:00:00 row: Load A → 110.0, Load B → 210.0.
Custom time column
Pass time_col when the timestamp column has a different name:
df2 = pd.DataFrame(
{
"measurement_time" : [
"2026-03-01 06:00:00" ,
"2026-03-01 06:00:00" ,
"2026-03-01 07:00:00" ,
],
"sensor_1" : [10.0 , 14.0 , 12.0 ],
"sensor_2" : [5.0 , 7.0 , 6.0 ],
}
)
clean_df2 = remove_duplicate_timestamps(
df= df2,
time_col= "measurement_time" ,
)
clean_df2
0
2026-03-01 06:00:00
12.0
6.0
1
2026-03-01 07:00:00
12.0
6.0
Alternative aggregation functions
Supported string values: "mean" (default), "median", "min", "max", "sum", "std", "var", "first", "last", "mode". Any callable is also accepted.
df3 = pd.DataFrame(
{
"Time (UTC)" : [
"2026-01-01 00:00:00" ,
"2026-01-01 00:00:00" ,
"2026-01-01 00:00:00" ,
"2026-01-01 01:00:00" ,
],
"load" : [10.0 , 10.0 , 90.0 , 55.0 ],
}
)
results = {}
for fn in ("mean" , "median" , "min" , "max" , "mode" ):
out = remove_duplicate_timestamps(
df= df3.copy(), agg= fn
)
results[fn] = float (out.loc[0 , "load" ])
pd.DataFrame.from_dict(results, orient= "index" , columns= ["00:00 value" ])
mean
36.666667
median
10.000000
min
10.000000
max
90.000000
mode
10.000000