Time Series Data with NumPy

[ad_1]

Time series data is unique because they depend on each other sequentially. This is because the data is collected over time in consistent intervals, for example, yearly, daily, or even hourly.

Time series data are important in many analyses because can represent patterns for business questions like data forecasting, anomaly detection, trend analysis, and more.

In Python, you can try to analyze the time series dataset with NumPy. NumPy is a powerful package for numerical and statistical calculation, but it can be extended into time series data.

How can we do that? Let’s try it out.

First, we need to install NumPy in our Python environment. You can do that with the following code if you haven’t done that.

Next, let’s try to initiate time series data with NumPy. As I have mentioned, time series data have sequential and temporal characteristics, so we would try to create them with NumPy.

import numpy as np

dates = np.array(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'], dtype="datetime64")
dates

Output>>
array(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
       '2023-01-05'], dtype="datetime64[D]")

As you can see in the code above, we set the data time series in NumPy with the dtype parameter. Without them, the data would be considered string data, but now it is considered time series data.

We can create the NumPy time series data without writing them individually. We can do that using the certain method from NumPy.

date_range = np.arange('2023-01-01', '2025-01-01', dtype="datetime64[M]")
date_range

Output>>
array(['2023-01', '2023-02', '2023-03', '2023-04', '2023-05', '2023-06',
       '2023-07', '2023-08', '2023-09', '2023-10', '2023-11', '2023-12',
       '2024-01', '2024-02', '2024-03', '2024-04', '2024-05', '2024-06',
       '2024-07', '2024-08', '2024-09', '2024-10', '2024-11', '2024-12'],
      dtype="datetime64[M]")

We create monthly data from 2023 to 2024, with each month’s data as the values.

After that, we can try to analyze the data based on the NumPy datetime series. For example, we can create random data with as much as our date range.

data = np.random.randn(len(date_range)) * 10 + 100

Output>>
array([128.85379394,  92.17272879,  81.73341807,  97.68879621,
       116.26500413,  89.83992529,  93.74247891, 115.50965063,
        88.05478692, 106.24013365,  92.84193254,  96.70640287,
        93.67819695, 106.1624716 ,  97.64298602, 115.69882628,
       110.88460629,  97.10538592,  98.57359395, 122.08098289,
       104.55571757, 100.74572336,  98.02508889, 106.47247489])

Using the random method in NumPy, we can generate random values to simulate time series analysis.

For example, we can try to perform a moving average analysis with NumPy using the following code.

def moving_average(data, window):
    return np.convolve(data, np.ones(window), 'valid') / window

ma_12 = moving_average(data, 12)
ma_12

Output>>
array([ 99.97075433,  97.03945458,  98.20526648,  99.53106381,
       101.03189965, 100.58353316, 101.18898821, 101.59158114,
       102.13919216, 103.51426971, 103.05640219, 103.48833188,
       104.30217122])

Moving average is a simple time series analysis in which we calculate the mean of the subset number of the series. In the example above, we use window 12 as the subset. This means we take the first 12 of the series as the subset and take their means. Then, the subset moves by one, and we take the next mean subset.

So, the first subset is this subset where we takes the mean:

[128.85379394,  92.17272879,  81.73341807,  97.68879621,
       116.26500413,  89.83992529,  93.74247891, 115.50965063,
        88.05478692, 106.24013365,  92.84193254,  96.70640287]

The next subset is where we slide the window by one:

[92.17272879,  81.73341807,  97.68879621,
       116.26500413,  89.83992529,  93.74247891, 115.50965063,
        88.05478692, 106.24013365,  92.84193254,  96.70640287,
        93.67819695]

That’s what the np.convolve does as the method would move and sum the series subset as much as the np.ones array number. We use the valid option only to return the amount that can be calculated without any padding.

Nevertheless, moving averages are often used to analyze time series data to identify the underlying pattern and as signals such as buy/sell in the financial field.

Speaking of patterns, we can simulate the trend data in time series with NumPy. The trend is a long-term and persistent directional movement in the data. Basically, it is the general direction of where the time series data would be.

trend = np.polyfit(np.arange(len(data)), data, 1)
trend

Output>>
array([ 0.20421765, 99.78795983])

What happens above is we fit a linear straight line to our data above. From the result, we get the slope of the line (first number) and the intercept (second number). The slope represents how much data changes per step or temporal values on average, while the intercept is the data direction (positive is upward and negative is downward).

We can also have detrended data, which are the components after we remove the trend from the time series. This data type is often used to detect fluctuation patterns in the trend data and anomalies.

detrended = data - (trend[0] * np.arange(len(data)) + trend[1])
detrended

Output>>
array([ 29.06583411,  -7.81944869, -18.46297706,  -2.71181657,
        15.66017371, -10.96912278,  -7.2707868 ,  14.29216727,
       -13.36691409,   4.61421499,  -8.98820376,  -5.32795108,
        -8.56037465,   3.71968235,  -5.00402087,  12.84760174,
         7.8291641 ,  -6.15427392,  -4.89028352,  18.41288776,
         0.6834048 ,  -3.33080706,  -6.25565918,   1.98750918])

The data without their trend are shown in the output above. In a real-world application, we would analyze them to see which one deviates too much from the common pattern.

We can also try to analyze seasonality from the time series data we have. Seasonality is the regular and predictable patterns that occur at specific temporal intervals, such as every 3 months, every 6 months, and others. Seasonality is usually affected by external factors such as holidays, weather, events, and many others.

seasonality = np.mean(data.reshape(-1, 12), axis=0)
seasonal_component = np.tile(seasonality, len(data)//12 + 1)[:len(data)]

Output>>
array([111.26599544,  99.16760019,  89.68820205, 106.69381124,
       113.57480521,  93.4726556 ,  96.15803643, 118.79531676,
        96.30525224, 103.4929285 ,  95.43351072, 101.58943888,
       111.26599544,  99.16760019,  89.68820205, 106.69381124,
       113.57480521,  93.4726556 ,  96.15803643, 118.79531676,
        96.30525224, 103.4929285 ,  95.43351072, 101.58943888])

In the code above, we calculate the average for each month and then extend the data to match its length. In the end, we get the average for each month in the two-year interval, and we can try to analyze the data to see if there is seasonality worth mentioning.

That’s all the basic method we can do with NumPy for time series data and analysis. There are many advanced methods, but the above is the basic we can do.

Conclusion

The time series data is a unique data set as it represents in a sequential manner and has temporal properties. Using NumPy, we can set the time series data while performing basic time series analysis such as moving averages, trend analysis, and seasonality analysis. data while performing basic time series analysis such as moving averages, trend analysis, and seasonality analysis.

Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.

[ad_2]

Source link

Time Series Data with NumPy

Time Series Data with NumPy

Time Series data with NumPy

Conclusion

Recent Posts

Recent Comments