The paper[1] introducing the Transformer architecture has been cited almost 150k times. By now, this deep learning architecture has been used for a large number of use cases. Obviously, language generation and large language models are among the most prominent use cases. However, the architecture has also been successfully employed to solve problems in computer vision and to forecast time series data to name only a few other examples.
At its core, the Transformer architecture is a deep neural network designed for sequence-to-sequence prediction tasks. E.g., mapping a sequence of words in one language to a sequence of words in another language as it is done in machine translation tasks. This architecture has recently gained attention for another application well-suited to sequence-to-sequence mapping: the analysis of telemetric log data from games[2]. While log data from games is one specific area that has been explored lately, this approach generally works for log data in other domains arising from websites or mobile apps.
In this talk, I will walk the audience through a simple Transformer architecture in Python that can be used to train a model on game log data. I will discuss the challenges of constructing a vocabulary and tokenizer based on log data. Unlike language data, game logs often contain structured events with properties, making vocabulary design non-trivial. I will highlight design choices in the model construction to balance the predictive power of the model and computational efficiency. This includes hyper-parameter selection for the model (e.g., embedding size, number of layers, etc.) and the training procedure (e.g., batch size, learning rate, etc.). I will also explain how to adapt the Transformer architecture to handle long sequences of log data efficiently, including architectural changes to the basic network.
I will demonstrate how representations derived from the model can be applied to various use cases, such as clustering and prediction tasks arising in game data science. Typical prediction tasks in game data science are survival time prediction for regression or purchase prediction for classification. Insights from clustering or player level predictions can help to improve retention or optimize monetization models. To evaluate the effectiveness of this approach, I trained multiple models on a publicly available 100GB game log dataset containing over 175 million events from NCSOFT’s MMORPG Blade and Soul. In addition to presenting qualitative results, I will compare the computational resources and hardware requirements of this method to those of a simple baseline algorithm.
By the end of the talk, attendees will gain actionable insights into building and training Transformers for log data, equipping them to tackle similar challenges in their own domains.
Tentative agenda of the talk: 5 - Intro 5 - Review of the Transformer architecture and its usage in GPT 10 - Adjusting the architecture to game log data 10 - Training of different models 10 - Obtaining player representations from the models for clustering and prediction tasks 5 - Outlook & Conclusion
[1] “Attention is all you need”, Vaswani et al., 2017 [2] “player2vec: A Language Modeling Approach to Understand Player Behavior in Games”, Wang et al., 2024 [3] “Game Data Mining Competition on Churn Prediction and Survival Analysis using Commercial Game Log Data”, Lee et al., 2018