Two computer science students created a Google Chrome extension that when clicked tells you if a Twitter user appears to be a bot or not.
They claim it has 93.5% accuracy (but see the footnote for a hint at some of the problems in how they came to that conclusion). It uses “machine learning” technology to attempt to identify Twitter accounts that may be automated “propaganda” accounts. Per the article, their classifier was trained using Tweets identified as left or right leaning – and those which they could not categorize as left or right must be bots. Or something. Regardless, that implies political views play a role in classification as a bot. Would a bot tweeting about cats be identified? Would a propaganda bot promoting backyard gardening be identified?
The results could be manipulated by users. When the bot check reports its results to you, you can optionally agree or disagree – and that information gets fed back to the classifier. A sufficient number of users could likely re-train the classifier to intentionally classify real people as bots, and bots as real people.
I am not convinced that software tools can classify propaganda bots with sufficient accuracy to be useful over the long term. There will be an arms race to create better bots that appear more natural. I fear that such tools may be used to stifle speech by incorrectly – or deliberately – classifying legitimate speech as “bot” generated to have that speech throttled down or banned.
Note also that Twitter – and Facebook – profit by having emotionally engaged users reading, liking, sharing and following more people. It is not yet in their financial interest to be aggressive about shutting down bots.
How good is 93.5% accuracy? Let’s consider a different example to understand this: the use of drug search dogs in schools to locate drugs in lockers.
Let’s say the dog has a 98% accuracy in finding drugs in a locker and a 2% false positive rate. Further, let’s assume there are 2,000 lockers in the school.
Let’s assume 1% of the students actually have drugs in their locker.
1% of 2000 students means 20 students actually have drugs in their locker. (And with the 2% false rate there is a chance that 1 of these actual students will be missed.)
In using the dog, the police will identify that 2% (the false positive rate) of the lockers incorrectly or 40 lockers will be suspected of having drugs in a school where only 20 lockers have drugs.
In other words, twice as many students will be falsely accused of having drugs as students who actually have drugs.
When doing broad classification searches, even a 98% accuracy rate is problematic as it may produce more false negatives than true positives, which is not what you would intuitively guess when you hear “98% accuracy” or in this Twitter bot analysis, 93.5% accuracy.
Further, in determining their 93.5% figure, while their approach is admirable and possibly the best that can be done, they compared verified Twitter user tweets to suspected “bots” from unverified accounts. Most Twitter accounts are unverified and they are only hypothesizing that an account is a bot when producing this metric. (FYI I think they have done an excellent job with their work, the best that can be done, and am impressed with their work. My comments should in no way be interpreted as negative comments towards these two students. For the record, I have a degree in computer science and an M.S. in software engineering and have familiarity – but not expertise – in machine learning, classifiers and pattern matching systems.)
Indeed, as the article points out, hundreds of people have already complained to another bot checker about being falsely classified as a bot. The Wired reporter attempted to contact the account holders of a small sample of accounts identified as bots and quickly found accounts that appeared to be run by real people.
Side note: the linked article in Wired is excellent journalism, something I certainly do not see enough of! Glad to see this article!