How to Handle Time Zones and Timestamps Accurately with Pandas
Image by Author | Midjourney

 

Time-based data can be unique when we face different time-zones. However, interpreting timestamps can be hard because of these differences. This guide will help you manage time zones and timestamps with the Pandas library in Python.

 

Preparation

 

In this tutorial, we’ll use the Pandas package. We can install the package using the following code.

 

Now, we’ll explore how to work with time-based data in Pandas with practical examples.
 

Handling Time Zones and Timestamps with Pandas

 

Time data is a unique dataset that provides a time-specific reference for events. The most accurate time data is the timestamp, which contains detailed information about time from year to millisecond.

Let’s start by creating a sample dataset.

import pandas as pd

data = {
    'transaction_id': [1, 2, 3],
    'timestamp': ['2023-06-15 12:00:05', '2024-04-15 15:20:02', '2024-06-15 21:17:43'],
    'amount': [100, 200, 150]
}

df = pd.DataFrame(data)
df['timestamp'] = pd.to_datetime(df['timestamp'])

 

The ‘timestamp’ column in the example above contains time data with second-level precision. To convert this column to a datetime format, we should use the pd.to_datetime function.”

Afterward, we can make the datetime data timezone-aware. For example, we can convert the data to Coordinated Universal Time (UTC)

df['timestamp_utc'] = df['timestamp'].dt.tz_localize('UTC')
print(df)

 

Output>> 
  transaction_id           timestamp  amount             timestamp_utc
0               1 2023-06-15 12:00:05     100 2023-06-15 12:00:05+00:00
1               2 2024-04-15 15:20:02     200 2024-04-15 15:20:02+00:00
2               3 2024-06-15 21:17:43     150 2024-06-15 21:17:43+00:00

 

The ‘timestamp_utc’ values contain much information, including the time-zone. We can convert the existing time-zone to another one. For example, I used the UTC column and changed it to the Japan Timezone.

df['timestamp_japan'] = df['timestamp_utc'].dt.tz_convert('Asia/Tokyo')
print(df)

 

Output>>>
  transaction_id           timestamp  amount             timestamp_utc  \
0               1 2023-06-15 12:00:05     100 2023-06-15 12:00:05+00:00   
1               2 2024-04-15 15:20:02     200 2024-04-15 15:20:02+00:00   
2               3 2024-06-15 21:17:43     150 2024-06-15 21:17:43+00:00   

            timestamp_japan  
0 2023-06-15 21:00:05+09:00  
1 2024-04-16 00:20:02+09:00  
2 2024-06-16 06:17:43+09:00 

 

We could filter the data according to a particular time-zone with this new time-zone. For example, we can filter the data using Japan time.

start_time_japan = pd.Timestamp('2024-06-15 06:00:00', tz='Asia/Tokyo')
end_time_japan = pd.Timestamp('2024-06-16 07:59:59', tz='Asia/Tokyo')

filtered_df = df[(df['timestamp_japan'] >= start_time_japan) & (df['timestamp_japan'] <= end_time_japan)]

print(filtered_df)

 

Output>>>
  transaction_id           timestamp  amount             timestamp_utc  \
2               3 2024-06-15 21:17:43     150 2024-06-15 21:17:43+00:00   

            timestamp_japan  
2 2024-06-16 06:17:43+09:00 

 

Working with time-series data would allow us to perform time-series resampling. Let’s look at an example of data resampling hourly for each column in our dataset.

resampled_df = df.set_index('timestamp_japan').resample('H').count()

 

Leverage Pandas’ time-zone data and timestamps to take full advantage of its features.

 

Additional Resources

 

 
 

Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.



Source link