Christopher Bouzy is trying to stay ahead of the bots. As the person behind Bot Sentinel, a popular bot-detection system, he and his team continuously update their machine learning models out of fear that they will get “stale.” The task? Sorting 3.2 million tweets from suspended accounts into two folders: “Bot” or “Not.”
To detect bots, Bot Sentinel’s models must first learn what problematic behavior is through exposure to data. And by providing the model with tweets in two distinct categories—bot or not a bot—Bouzy’s model can calibrate itself and allegedly find the very essence of what, he thinks, makes a tweet problematic.
Training data is the heart of any machine learning model. In the burgeoning field of bot detection, how bot hunters define and label tweets determines the way their systems interpret and classify bot-like behavior. According to experts, this can be more of an art than a science. “At the end of the day, it is about a vibe when you are doing the labeling,” Bouzy says. “It’s not just about the words in the tweet, context matters.”
He’s a Bot, She’s a Bot, Everyone’s a Bot
Before anyone can hunt bots, they need to figure out what a bot is—and that answer changes depending on who you ask. The internet is full of people accusing each other of being bots over petty political disagreements. Trolls are called bots. People with no profile picture and few tweets or followers are called bots. Even among professional bot hunters, the answers differ.
Bot Sentinel is trained to weed out what Bouzy calls “problematic accounts”—not just automated accounts. Indiana University informatics and computer science professor Filippo Menczer says the tool he helps develop, Botometer, defines bots as accounts that are at least partially controlled by software. Kathleen Carley is a computer science professor at the Institute for Software Research at Carnegie Mellon University who has helped develop two bot-detection tools: BotHunter and BotBuster. Carley defines a bot as “an account that is run using completely automated software,” a definition that aligns with Twitter’s own. “A bot is an automated account—nothing more or less,” the company wrote in a May 2020 blog post about platform manipulation.
Just as the definitions differ, the results these tools produce don’t always align. An account flagged as a bot by Botometer, for example, might come back as perfectly humanlike on Bot Sentinel, and vice versa.
Some of this is by design. Unlike Botometer, which aims to identify automated or partially automated accounts, Bot Sentinel is hunting accounts that engage in toxic trolling. According to Bouzy, you know these accounts when you see them. They can be automated or human-controlled, and they engage in harassment or disinformation and violate Twitter’s terms of service. “Just the worst of the worst,” Bouzy says.
Botometer is maintained by Kaicheng Yang, a PhD candidate in informatics at the Observatory on Social Media at Indiana University who created the tool with Menczer. The tool also uses machine learning to classify bots, but when Yang is training his models, he’s not necessarily looking for harassment or terms of service violations. He’s just looking for bots. According to Yang, when he labels his training data he asks himself one question: “Do I believe the tweet is coming from a person or from an algorithm?”
How to Train an Algorithm
Not only is there no consensus on how to define a bot, but there’s no single clear criteria or signal any researcher can point to that accurately predicts whether an account is a bot. Bot hunters believe that exposing an algorithm to thousands or millions of bot accounts helps a computer detect bot-like behavior. But the objective efficiency of any bot-detection system is muddied by the fact that humans still have to make judgment calls about what data to use to build it.
Take Botometer, for example. Yang says Botometer is trained on tweets from around 20,000 accounts. While some of these accounts self-identify as bots, the majority are manually categorized by Yang and a team of researchers before being crunched by the algorithm. (Menczer says some of the accounts used to train Botometer come from data sets from other peer-reviewed research. “We try to use all the data that we can get our hands on, as long as it comes from a reputable source,” he says.)