Mining
User-Aware Rare Sequential Topic Patterns in Document Streams
Abstract
Textual documents created and distributed on the
Internet are ever changing in various forms. Most of existing works are devoted
to topic modeling and the evolution of individual topics,
while sequentialrelations
of topics in successive documents published by a specific user are ignored. In this paper, in order
to characterize and detect personalized and abnormal behaviors of Internet
users, we proposeSequential Topic Patterns (STPs) and formulate the problem of mining User-aware Rare SequentialTopic Patterns (URSTPs)
in document streams on the Internet. They are rare on the whole but relatively frequent
for specific users, so can be applied in many real-life scenarios, such as
real-time monitoring on abnormal user behaviors.
We present a group of algorithms to solve this innovativemining problem through three phases:
preprocessing to extract probabilistic topics and
identify sessions for different users, generating all the STP candidates with
(expected) support values for eachuser by pattern-growth, and selecting URSTPs by making user-aware rarity
analysis on derived STPs. Experiments on both real (Twitter) and synthetic
datasets show that our approach can indeed discover special users and
interpretable URSTPs effectively and efficiently, which significantly reflect
users’ characteristics.
No comments:
Post a Comment