Iori Suzuki (Graduate School of Environment and Information Sciences, Yokohama National University), Yin Minn Pa Pa (Institute of Advanced Sciences, Yokohama National University), Nguyen Thi Van Anh (Institute of Advanced Sciences, Yokohama National University), Katsunari Yoshioka (Graduate School of Environment and Information Sciences, Yokohama National University)
Decentralized Finance (DeFi) token scams have become one of the most prevalent forms of fraud in Web-3 technology, generating approximately $241.6 million in illicit revenue in 2023 [1]. Detecting these scams requires analyzing both on-chain data, such as transaction records on the blockchain, and off-chain data, such as websites related to the DeFi token project and associated social media accounts. Relying solely on one type of data may fail to capture the full context of fraudulent transparency inherent in blockchain technology, off-chain data often disappears alongside DeFi scam campaigns, making it difficult for the security community to study these scams. To address this challenge, we propose a dataset comprising more than 550 thousand archived web and social media data as off-chain data, in addition to on-chain data related to 32,144 DeFi tokens deployed on Ethereum blockchain from September 24, 2024 to January 14, 2025. This dataset aims to support the security community in studying and detecting DeFi token scams. To illustrate its utility, our case studies demonstrated the potential of the dataset in identifying patterns and behaviors associated with scam tokens. These findings highlight the dataset’s capability to provide insights into fraudulent activities and support further research in developing effective detection mechanisms.