Enhancing Business Planning Through Advanced Forecasting Methods.
Introduction
Forecasting is one of the core domains of Artificial Intelligence (AI) in academic research and industrial applications. In fact, it is probably one of the most ubiquitous challenges we can find across all industries. Accurately predicting future sales volumes and market trends is essential for businesses to optimize their planning processes. This includes enhancing contribution margins, minimizing waste, ensuring adequate inventory levels, optimizing the supply chain, and improving decision-making overall.
Developing a forecast model represents a complex and multifaceted challenge. It requires a deep understanding of State-Of-The-Art (SOTA) forecasting methodologies and the specific business domain to which they are applied. Furthermore, the forecast engine will act as a critical infrastructure within an organization, supporting a broad spectrum of processes across various departments. For instance:
- The Marketing team leverages the model to inform strategic decisions regarding investment allocations for upcoming periods, such as the next month or quarter.
- The Procurement team utilizes the model to make informed decisions about purchase quantities and timing from suppliers, optimizing inventory levels and reducing waste or shortages.
- The Operations team uses the forecasts to optimize the production lines. They can deploy resources and workforce to meet expected demand while minimizing operational costs.
- The Finance team relies on the model for budgeting purposes, using forecast data to project monthly financial requirements and allocate resources accordingly.
- The Customer Service team uses the forecast to anticipate customer inquiry volumes, allowing the team to right-size staffing levels while ensuring high-quality customer service and minimizing wait times.
Recent advancements in forecasting have also been shaped by the successful development of foundational models across various domains, including text (e.g., ChatGPT), text-to-image (e.g., Midjourney), and text-to-speech (e.g., Eleven Labs). The wide adoption of these models has resulted in the introduction of models like TimeGPT [1], designed to generate predictions on previously unseen data through zero-shot inference. These models leverage the methodologies and architectures that resemble their predecessors in text, image, and speech. A general pre-trained model would constitute a paradigm shift in tackling forecasting tasks. It would make it more accessible to organizations, reduce the computation complexity, and make it more accurate overall.
In this article, we provide an in-depth explanation of the possible architecture behind TimeGPT. We also cover the main components that allow the model to perform zero-shot inference. Following this theoretical overview, we then apply TimeGPT to a specific use case and dataset. We cover the practical implementation details and conduct a thorough analysis of the model’s performance. Finally, we compare the performance of TimeGPT with TiDE [2], an ‘embarrassingly’ simple MLP that beats Transformers in forecasting use cases.
TimeGPT
TimeGPT [1] was the first foundation model for time series forecasting, characterized by its ability to generalize across diverse domains. It can produce precise forecasts on datasets beyond those used during its training phase. The field of research surrounding foundation models for time series forecasting has been experiencing significant growth recently. Notable recent contributions include “MOMENT” developed by researchers at Carnegie Mellon University (CMU) [3], “TimesFM” from Google [4], “Lag-Llama,” a collaborative effort between Morgan Stanley and ServiceNow [5], and “Moirai” from Salesforce [6]. We plan to cover other time series foundational models for time series forecasting in the future.
TimeGPT leverages transfer learning to perform well in a zero-shot inference setup. It was trained with 100 billion data points from a large collection of publicly available datasets of various domains such as economics, demographics, healthcare, weather, IoT sensor data, energy, web traffic, sales, transport, and banking.
The extensive diversity of domains allows the model to capture complex patterns such as multiple seasonalities, cycles of different lengths, and evolving trends. Additionally, the datasets exhibit a range of noise levels, outliers, drift, and other characteristics. While some consist of clean data with regular patterns, others have unexpected events and behaviors where trends and patterns may fluctuate over time. These challenges provide many scenarios for the model to learn from, improving its robustness and generalization capabilities.
Architecture
TimeGPT is a Transformer-based model specifically designed for time series forecasting, incorporating a self-attention mechanism within an encoder-decoder architecture. By leveraging the self-attention mechanism, it can dynamically weigh the significance of different points in the time series.
The model receives a window of historical values (y) and exogenous covariates (x) as input. The covariates may include additional time series data and/or binary variables that denote specific events, such as public holidays. These inputs are augmented with sequential information by integrating local positional embeddings. This allows the model to be aware of the temporal dependencies. While not explicitly stated by the authors, we believe all the inputs are concatenated after the positional encoding, producing the final input to feed the encoder.
TimeGPT vs. TiDE: a comparison in a real use case
In this section, we will use TimeGPT to forecast sales using a real-world dataset from one of our clients. Subsequently, we compare the forecasting performance of TimeGPT with TiDE, using the same cutoff date for analysis.
TiDE [2] is a novel multivariate time-series model that can use static covariates (e.g., brand of a product) and known or unknown dynamic covariates in the forecast horizon (e.g., price of a product) to generate an accurate forecast. Unlike the complex architecture of Transformers, TiDE is based on a simple Encoder-Decoder architecture with a Residual connection where:
- The Encoder is responsible for mapping the past target values and the covariates of a time series into a dense representation of features. First, the feature projection reduces the dimensionality of the dynamic covariates. Then, the Dense Encoder receives the output of the Feature Projection concatenated with static covariates and the past values to map them into a single embedding representation.
- The Decoder receives the embedding representation and converts it into future predictions. The Dense Decoder maps the embedding representation into a vector per time-step in the horizon. Afterward, the Temporal Decoder combines the output of the Dense Decoder with the projected features of that time step to produce the predictions.
- Finally, the Residual Connection linearly maps the look-back to a vector with the size of the horizon, which is added to the output of the Temporal Decoder to produce the final predictions.
References
[1] Garza, A., & Mergenthaler-Canseco, M. (2023). TimeGPT-1. Retrieved from arXiv:2310.03589.
[2] Abhimanyu Das, Weihao Kong, Andrew Leach, Shaan Mathur, Rajat Sen, Rose Yu. (2023) Long-term Forecasting with TiDE: Time-series Dense Encoder. arXiv:2304.08424.
[3] Goswami, M., Szafer, K., Choudhry, A., Cai, Y., Li, S., & Dubrawski, A. (2024). MOMENT: A Family of Open Time-series Foundation Models. Retrieved from arXiv:2402.03885 (cs.LG).
[4] Das, A., Kong, W., Sen, R., & Zhou, Y. (2024). A decoder-only foundation model for time-series forecasting. Retrieved from arXiv:2310.10688 (cs.CL).
[5] Rasul, K., Ashok, A., Williams, A. R., Ghonia, H., Bhagwatkar, R., Khorasani, A., Darvishi Bayazi, M. J., Adamopoulos, G., Riachi, R., Hassen, N., Biloš, M., Garg, S., Schneider, A., Chapados, N., Drouin, A., Zantedeschi, V., Nevmyvaka, Y., & Rish, I. (2024). Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting. Retrieved from arXiv:2310.08278 (cs.LG).
[6] Woo, G., Liu, C., Kumar, A., Xiong, C., Savarese, S., & Sahoo, D. (2024). Unified Training of Universal Time Series Forecasting Transformers. Retrieved from arXiv:2402.02592 (cs.LG).
[7] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. Retrieved from arXiv:1706.03762.
[8] Stankeviciute, K., Alaa, A. M., & van der Schaar, M. (2021). Conformal time-series forecasting. In Advances in Neural Information Processing Systems (Vol. 34, pp. 6216–6228).