[ad_1]

Two Techniques for Visualizing Many Time-Series at Once

Imagine this: you’ve got a bunch of line charts, and you’re confident that there’s at least one trend hiding somewhere in all that data. Whether you’re tracking sales across thousands of your company’s products or diving into stock market data, your goal is to uncover those subtrends and make them stand out in your visualization. Let’s explore a couple of techniques to help you do just that.

Hundreds of lines plotted, but it isn’t clear what the subtrends are. This synthetic data can show the benefit of these strategies. (Image by author)

Density Line Charts are a clever plotting technique introduced by Dominik Moritz and Danyel Fisher in their paper, Visualizing a Million Time Series with the Density Line Chart. This method transforms numerous line charts into heatmaps, revealing areas where the lines overlap the most.

When we apply Density Line Charts to the synthetic data we showed earlier, the results look like this:

PyDLC allows us to see “hot spots” where a high degree of lines overlap. (Image by author)

This implementation allows us to see where our trends are appearing, and identify the subtrends that make this data interesting.

For this example we use the Python library PyDLC by Charles L. Bérubé. The implementation is quite straightforward, thanks to the library’s user-friendly design.

plt.figure(figsize=(14, 14))
im = dense_lines(synth_df.to_numpy().T,
x=synth_df.index.astype('int64'),
cmap='viridis',
ny=100,
y_pad=0.01
)

plt.ylim(-25, 25)

plt.axhline(y=0, color='white', linestyle=':')

plt.show()

When using Density Line Charts, keep in mind that parameters like ny and y_pad may require some tweaking to get the best results.

This technique isn’t as widely discussed and doesn’t have a universally recognized name. However, it’s essentially a variation of “line density plots” or “line density visualizations,” where we use thicker lines with low opacity to reveal areas of overlap and density.

This technique shows subtrends quite well and reduces cognitive load from the many lines. (Image by author)

We can clearly identify what seem to be two distinct trends and observe the high degree of overlap during the downward movements of the sine waves. However, it’s a bit trickier to pinpoint where the effect is the strongest.

The code for this approach is also quite straightforward:

plt.figure(figsize=(14, 14))

for column in synth_df.columns:
plt.plot(synth_df.index,
synth_df[column],
alpha=0.1,
linewidth=2,
label=ticker,
color='black'
)

Here, the two parameters that might require some adjustment are alpha and linewidth.

Imagine we’re searching for subtrends in the daily returns of 50 stocks. The first step is to pull the data and calculate the daily returns.

import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

stock_tickers = [
'AAPL', 'MSFT', 'GOOGL', 'AMZN', 'TSLA', 'META', 'NVDA', 'BRK-B', 'UNH', 'V',
'HD', 'MA', 'KO', 'DIS', 'PFE', 'NKE', 'ADBE', 'CMCSA', 'NFLX', 'CSCO',
'INTC', 'AMGN', 'COST', 'PEP', 'TMO', 'AVGO', 'QCOM', 'TXN', 'ABT', 'ORCL',
'MCD', 'MDT', 'CRM', 'UPS', 'WMT', 'BMY', 'GILD', 'BA', 'SBUX', 'IBM',
'MRK', 'WBA', 'CAT', 'CVX', 'T', 'MS', 'LMT', 'GS', 'WFC', 'HON'
]

start_date = '2024-03-01'
end_date = '2024-09-01'

percent_returns_df = pd.DataFrame()

for ticker in stock_tickers:
stock_data = yf.download(ticker, start=start_date, end=end_date)

stock_data = stock_data.fillna(method='ffill').fillna(method='bfill')

if len(stock_data) >= 2:
stock_data['Percent Daily Return'] = stock_data['Close'].pct_change() * 100

stock_data['Ticker'] = ticker
percent_returns_df = pd.concat([percent_returns_df, stock_data[['Ticker', 'Percent Daily Return']]], axis=0)

percent_returns_df.reset_index(inplace=True)

display(percent_returns_df)

We can then plot the data.

pivot_df = percent_returns_df.pivot(index='Date', columns='Ticker', values='Percent Daily Return')

pivot_df = pivot_df.fillna(method='ffill').fillna(method='bfill')

plt.figure(figsize=(14, 14))
sns.lineplot(data=pivot_df, dashes=False)
plt.title('Percent Daily Returns of Top 50 Stocks')
plt.xlabel('Date')
plt.ylabel('Percent Daily Return')
plt.legend(title='Stock Ticker', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True)
plt.tight_layout()

A very messy many-line plot with little discernible information. (Image by author)

The Density Line Chart does face some challenges with this data due to its sporadic nature. However, it still provides valuable insights into overall market trends. For instance, you can spot periods where the densest areas correspond to significant dips, highlighting rough days in the market.

(Image by author)
plt.figure(figsize=(14, 14))
im = dense_lines(pivot_df[stock_tickers].to_numpy().T,
x=pivot_df.index.astype('int64'),
cmap='viridis',
ny=200,
y_pad=0.1
)

plt.axhline(y=0, color='white', linestyle=':')
plt.ylim(-10, 10)

plt.show()

However, we find that the transparency technique performs significantly better for this particular problem. The market dips we mentioned earlier become much clearer and more discernible.

(Image by author)
plt.figure(figsize=(14, 14))

for ticker in pivot_df.columns:
plt.plot(pivot_df.index,
pivot_df[ticker],
alpha=0.1,
linewidth=4,
label=ticker,
color='black'
)

Both strategies have their own merits and strengths, and the best approach for your work may not be obvious until you’ve tried both. I hope one of these techniques proves helpful for your future projects. If you have any other techniques or use cases for handling massive line plots, I’d love to hear about them!

Thanks for reading, and take care.

[ad_2]

Source link