Social media users and inauthentic accounts, such as bots, may coordinate in promoting their topics. Such topics may give the impression that they are organically popular among the public, even though they are astroturfing campaigns that are centrally managed. It is challenging to predict if a topic is organic or a coordinated campaign due to a reliable lack of ground truth.
We create a ground truth by detecting the campaigns promoted by ephemeral astroturfing attacks. These attacks push any topic to Twitter's trends list by employing bots that tweet in a coordinated manner within a short period and then immediately delete their tweets. We also manually curate a dataset of organic Twitter trends. We then create engagement networks out of these datasets and present them as graph classification datasets, where the task is to distinguish between campaigns and organic trends. Engagement networks consist of users as nodes and edges indicate interactions (retweets, replies and quotes) between users.
We release the data of 170 campaigns and 135 non-campaigns. Our graph dataset posits a challenge for large graph classification problems. Traditional graph classification datasets are small, with tens of nodes and hundreds of edges at most. In comparison to standard benchmarks, our graphs are at a larger scale. On average, each engagement network in our dataset contains ~11K nodes and ~23K edges. We show that state-of-the-art GNN methods give only mediocre results on our datasets, hence our datasets offer a new challenge for graph classification problem. We believe that our dataset will help advance the frontiers of graph classification techniques on large networks and provide an interesting use case in terms of distinguishing coordinated campaigns and organic trends.
Our dataset (LEN), comprises of graphs where the nodes denote the users and the edges indicate the type of interaction between the user, which in this case could be retweet reply or quote. We have a total of 305 engagement graphs, comprising of 170 campaign graphs and 135 non-campaign graphs. There are 7 sub-types in campaign and 8 in non-campaign as indicated in the Overall Graph Description below.
| Sub-types | # graphs | # nodes | # edges | Explanation | |||||
|---|---|---|---|---|---|---|---|---|---|
| Min | Max | Avg | Min | Max | Avg | ||||
| Campaign | Politics | 62 | 100 | 50,286 | 6,570 | 203 | 71,704 | 10,210 | Political promotions, slogans, misinformation camp. |
| Reform | 58 | 131 | 19,578 | 1,229 | 540 | 1,105,918 | 25,268 | People organized for political reforms. | |
| News | 24 | 581 | 54,996 | 10,368 | 942 | 80,784 | 15,582 | News pumped up by bots and trolls for more attention. | |
| Finance | 14 | 273 | 9,976 | 1,802 | 243 | 10,725 | 2,334 | Finance marketing (mostly cryptocurrency). | |
| Noise | 9 | 454 | 55,933 | 12,180 | 473 | 48,937 | 10,882 | Cannot be put in any type. | |
| Cult | 6 | 313 | 7,880 | 2,303 | 637 | 11,615 | 3,431 | Slogans by a famous cult with immense access to bots. | |
| Entertainment | 3 | 678 | 4,220 | 2,237 | 3,806 | 132,013 | 48,767 | Celebrities attempting to promote themselves. | |
| Common | 3 | 3,487 | 9,974 | 5,919 | 2,818 | 9,470 | 7,066 | Common sub-strings combined without known reasons. | |
| Overall | 170 | 100 | 55,933 | 5,157 | 203 | 1,105,918 | 16,006 | ||
| Non-Campaign | News | 52 | 818 | 95,575 | 24,834 | 709 | 213,444 | 43,201 | Popular events, sourced outside Twitter. |
| Sports | 30 | 469 | 75,653 | 9,530 | 403 | 101,656 | 12,948 | Popular sports events. | |
| Festival | 17 | 885 | 119,952 | 35,466 | 803 | 199,305 | 55,947 | About festivals, holidays, special days. | |
| Internal | 11 | 4,188 | 87,720 | 33,061 | 4,374 | 196,103 | 54,442 | Popular events, sourced inside Twitter. | |
| Common | 10 | 1,214 | 64,320 | 17,079 | 1,270 | 99,306 | 24,869 | Common substrings combined by people. | |
| Entertainment | 8 | 1,477 | 20,060 | 7,289 | 1,712 | 45,211 | 12,578 | Popular TV shows and YouTube videos. | |
| Announced cam. | 4 | 6,650 | 26,358 | 13,382 | 14,362 | 50,864 | 24,817 | Official campaigns launched by major political parties. | |
| Sports cam. | 3 | 2,880 | 4,661 | 3,654 | 4,451 | 7,367 | 5,534 | Hashtags launched by popular sports teams. | |
| Overall | 135 | 469 | 119,952 | 20,632 | 403 | 213,444 | 33,765 | ||
We also have a smaller, balanced verssion of LEN (LEN-Small), which has 51 campaign and 49 non-campaign graphs. A brief description of this table is given below in the Small Graph Description Table.
| Sub-types | # graphs | # nodes | # edges | |||||
|---|---|---|---|---|---|---|---|---|
| Min | Max | Avg | Min | Max | Avg | |||
| Campaign | Politics | 14 | 100 | 1,908 | 805 | 203 | 2,000 | 1108 |
| Reform | 16 | 131 | 634 | 297 | 540 | 2,027 | 1192 | |
| News | 3 | 581 | 1,671 | 1123 | 942 | 1,726 | 1410 | |
| Finance | 9 | 273 | 1,590 | 775 | 243 | 1,862 | 1024 | |
| Noise | 5 | 454 | 2,520 | 1060 | 473 | 1,634 | 1074 | |
| Cult | 4 | 313 | 705 | 512 | 637 | 1,035 | 843 | |
| Overall | 51 | 100 | 2,520 | 661 | 203 | 2,027 | 1113 | |
| Non-Campaign | News | 10 | 818 | 6,169 | 3757 | 709 | 9,076 | 4578 |
| Sports | 23 | 469 | 8,355 | 3357 | 403 | 9,998 | 3994 | |
| Festival | 2 | 885 | 5,982 | 3433 | 803 | 6,509 | 3656 | |
| Internal | 1 | 4,188 | 4,188 | 4,188 | 4,374 | 4,374 | 4374 | |
| Common | 5 | 1,214 | 4,962 | 2,989 | 1,270 | 6,277 | 3559 | |
| Entertainment | 5 | 1,477 | 7,739 | 4,391 | 1,712 | 10,608 | 6021 | |
| Sports cam. | 3 | 2,880 | 4,661 | 3,654 | 4,451 | 7,367 | 5534 | |
| Overall | 49 | 469 | 8,355 | 3545 | 403 | 10,608 | 4364 | |