There are two ways to aggregate time series data
November 2023 Last updated: June 2024
The result of an aggregation depends on how we deal with time zones. If we're lucky, our aggregation only involves a single one. However, our data doesn't even have to be particularly complex, showing the results of an aggregation to someone in another country is enough to make things interesting.
We can aggregate time series data in the data's time zone or in the user's time zone. We'll see that these two approaches can yield completely different results.
Definition: time zones and offsets
In the following, it's important to know the difference between time zones and offsets: Time zones specify locations (e.g. “Europe/Madrid”); Offsets specify divergences from UTC (e.g. ”UTC+02”). The time zone of a location usually stays the same while a location's offset can change, e.g. due to daylight savings time.
Aggregating in the data's time zone
Let's say we receive measurements from two sensors, one in Madrid and the second one in Athens. Imagine it's summer, then Madrid has an offset of UTC+02 and Athens has an offset of UTC+03. We're looking at the data from London, which has an offset of UTC+01 in the summer.
To aggregate in the data's time zone, we group measurements from when the clocks in the different sensor locations showed the same time. Purple crosses represent measurements; The aggregation intervals are shown in green.
To calculate the value for 10:00, we aggregate measurements from when the clocks in the different sensor locations showed a time between 10:00 and 11:00. Note that these measurements were not taken at the same point in time!
Aggregating in the user's time zone
Conversely, to aggregate in the user's time zone, we group the data by what our local clock in London showed at the time of measurement.
To calculate the value for 10:00, we aggregate measurements from when our local clock in London showed a time between 10:00 and 11:00. Contrary to before, these measurements were taken at the same point in time, but the clocks in the different sensor locations showed different times.
How to choose
Think about which time zone is more important for your use case, the data's time zone or the user's time zone? The aggregation can effectively only keep one.
Aggregate in the data's time zone for many analytical questions: At what time of day do our sensors measure the highest temperature? Aggregate in the user's time zone e.g. for real-time applications: How many measurements did our sensors take in the last hour?
Finally, consider indicating whether an aggregation is in the data's time zone or the user's time zone when those two don't coincide. Our two examples aggregate completely different measurements!
Thanks to Paul Moosbrugger, Patrick Aigner, Simon Böhm, and Julia Godart for reading drafts of this.
Footnotes
- It's rare, but a location's time zone can indeed change: China was for example once divided into five time zones with five different offsets but switched to its current single time zone with an offset of UTC+08 in 1949.