Modern challenges of disinformation in media: anomaly detection in social network metrics using machine learning models
DOI: 10.31673/2412-9070.2025.027701
Анотація
The article considers the problem of detecting anomalies in the time series of media content metrics obtained from Telegram channels. The task of detecting anomalies is relevant in the context of combating disinformation and analyzing the dynamics of content distribution in social networks. Anomalies in metrics, such as the number of views, shares, comments, and reactions, may indicate manipulative actions, including the use of bots, falsification of reach, or the spread of disinformation.
A dataset obtained through the official Telegram API was used for the analysis. A feature of the data is the lack of retrospective values of metrics, which complicates the analysis of their dynamics. This problem was partially solved by collecting data at fixed time intervals after the publication of each post. The collected data was grouped by the channels of origin of the posts and time intervals after publication to ensure the accuracy of the analysis. Since the data were unlabeled, manual processing was used to remove outliers and ensure the reliability of the modeling.
The article analyzes the functionality of five Python libraries for detecting anomalies in time series: PyOD, TODS, PySAD, Darts, and Prophet. Their compliance with the requirements was assessed, in particular, with respect to working with time series, processing incomplete data, real-time support, seasonality, and computational efficiency. A comparison was made based on tables and graphs that demonstrate the results of using each library. In particular, the PyOD library is a well-known tool for detecting anomalies, but does not support direct work with time series. TODS has the potential to detect anomalies in streaming data, but its development has been discontinued. PySAD specializes in streaming data analysis, but requires a fixed frequency of input data, which limits its application. The Darts library offers a wide set of algorithms for time series analysis, but requires pre-filling of missing values, which creates an additional load on the model. The best results were achieved using the Prophet model, which is able to work with irregular time series without the need for additional data augmentation.
The experiments conducted showed that Prophet provides the best balance between forecast accuracy and computational efficiency. The choice of the amount of historical data for modeling is crucial, since excessive data increases processing time, and insufficient data reduces forecast accuracy.
The results of the study indicate that none of the considered libraries is a universal solution for all tasks. However, Prophet showed the greatest potential for detecting anomalies in the time series of Telegram metrics, which makes it the best candidate for further development and adaptation in media content monitoring tasks.
Keywords: predictive model, optimization, social networks, machine learning, data analytics, anomaly detection, engagement metrics.