Jiawei Li (Beihang University & National University of Singapore), Jiahao Liu (National University of Singapore), Jian Mao (Beihang University), Jun Zeng (National University of Singapore), Zhenkai Liang (National University of Singapore)
Many mobile apps utilize UI widgets to interact with users and trigger specific operational logic, such as clicking a button to send a message. While UI widgets are designed to be intuitive and user-friendly, they can also be misused to perform harmful behaviors that violate user expectations. To address these potential threats, recent studies strive to understand the intentions of UI widgets in mobile apps. However, existing methods either concentrate on the surface-level features of UI widgets, failing to capture their underlying intentions, or involve tedious and faulty information, making it challenging to distill the core intentions. In this paper, we present UI-CTX, which demystifies UI behaviors with a concise and effective representation. For each UI widget, UI-CTX first represents its intentions with a UI Handler Graph (UHG), incorporating the code context behind the widget while eliminating irrelevant information (e.g., unreachable code blocks). Then, UI-CTX performs graph summarization and explores both the structural and semantic information in UHGs to model the core intentions of UI widgets. To systematically evaluate UI-CTX, we extract a series of UI widget behaviors, such as login and search, from a large-scale dataset and conduct extensive experiments. Experimental results show that UI-CTX can effectively represent the intentions of UI widgets and significantly outperforms existing solutions in modeling UI widget behaviors. For example, in the task of classifying UI widget intentions, UHG achieves the highest average F1-score compared to other widget representations (+95.2% and +8.2% compared with permission set and call sequence, respectively) used in state-of-the-art approaches. Additionally, by accurately pinpointing the code contexts of widgets, UI-CTX achieves a $mathbf{3.6times}$ improvement in widget intention clustering performance.