In a world full of algorithms, humans aren’t so nice online.
Carnegie Mellon researchers have recently proposed an AI that surfaces positive online comments. In an era of inclusion and social division online, can AI help minorities and the vulnerable?
Researchers at Carnegie Mellon’s Language Technologies Institute say they’ve developed a system that taps machine learning to analyze online comments and pick out those that defend or sympathize with disenfranchised peoples.
The LTI researchers have developed a system that leverages artificial intelligence to rapidly analyze hundreds of thousands of comments on social media and identify the fraction that defend or sympathize with disenfranchised minorities. LTI here stands for the Carnegie Mellon University’s Language Technologies Institute.
The majority of YouTube comments skew to negative. An AI that can even find the good comments and sift through all the noise is impressive.
“Even if there’s lots of hateful content, we can still find positive comments,” said Ashiqur R. KhudaBukhsh, a post-doctoral researcher in the LTI who conducted the research with alumnus Shriphani Palakodety. Finding and highlighting these positive comments, they suggest, might do as much to make the internet a safer, healthier place as would detecting and eliminating hostile content or banning the trolls responsible.
With anonymous IDs online most Western communities like Reddit, Twitter, Facebook Groups and so forth have a lot of negativity. Platforms like LinkedIn suffer from fake positive comments, which lack any sense of honesty or real dialogue. So it’s fascinating AI would be able to get better at viewing comments online especially with regards to real events in the world, which are constantly happening, that deserve our support.
AI, News, and Social Media Commentary
Although it hasn’t been commercialized, they’ve used this AI in experiments to search for nearly a million YouTube comments, focusing on the Rohingya refugee crisis and the February 2019 Pulwama terrorist attack in Kashmir.
Tamping down on abusive online behavior is no easy feat, particularly considering the level of toxicity in some social circles. In 2019, the world became an even more toxic place on the internet, with more people deleting their Facebook accounts and spending less time on Twitter, Reddit, LinkedIn and so forth. Even as Pinterest has overtaken Snapchat as the #3 USA social media channel by active users, Instagram and Facebook cannot be said to be benevolent places.
Many minorities are powerless to defend themselves on the Internet. Left to themselves, the Rohingyas are largely defenseless against online hate speech. Many of them have limited proficiency in global languages such as English, and they have little access to the internet.
As the researchers explain, improvements in AI language models — which learn from many examples to predict what words are likely to occur in a given sentence — made it possible to analyze such large quantities of text.
Existing machine learning methods create representations of words, or word embeddings, so that all words with a similar meaning are represented in the same way. This technique makes it possible to compute the proximity of a word to others in a comment or post. In this way the interaction between AI, NLP and the internet is improving.
Specifically, the researchers obtained embeddings — numerical representations of words — that revealed novel language groupings or clusters. Language models create these so that words with similar meanings are represented in the same way, making it possible to compute the proximity of a word to others in a comment or post.
For example, the team reports that in experiments, their approach worked as well or better than commercially available solutions. Random samplings of the YouTube comments showed about 10% were positive, compared with the 88% found with the AI algorithm.
To find relevant help speech, the researchers used their technique to search more than a quarter of a million comments from YouTube in what they believe is the first AI-focused analysis of the Rohingya refugee crisis. They will present their findings at the Association for the Advancement of Artificial Intelligence’s annual conference, Feb. 7-12, in New York City.
I personally find this fascinating as the way machine intelligence helps humans navigate news, commentary, and sentiment online will change this decade. The rise of video and podcasts means we will consume less text online. Which comments rise to the surface also changes our experience of a channel, whether it be on YouTube, Reddit, TikTok or another app.
AI can and should be used for benevolent means to help us interpret world events and empathize with minorities, vulnerable populations and groups that have less privilege than ourselves.
The study follows the release of a data set by Jigsaw — the organization working under Google parent company Alphabet to tackle cyber bullying, censorship, disinformation, and other digital issues of the day — containing hundreds of thousands of comments and annotations with toxicity and identity labels.
If Google and other big tech companies have conducted data sharing, it’s also up to them to help create a better internet, one where social justice and goodwill improve, and not just advertising revenues.
The ability to analyze such large quantities of text for content and opinion is possible because of recent major improvements in language models, said Jaime Carbonell, LTI director and a co-author on the study.
Samplings of the YouTube comments showed about 10% of the comments were positive. When the researchers used their method to search for help speech in the larger dataset, the results were 88% positive, indicating that the method could substantially reduce the manual effort necessary to find them.
Studies like these also is a social mirror for how toxic the western internet of Facebook, YouTube, Reddit, and Instagram truly are when our ID is hidden behind a screen name, and when we are anonymous players in a social game without consequences for the feelings of others and the misfortunes they may be experiencing.
An advertising centric internet is basically one without a heart and most algorithms have been designed to produce ad-revenues, not improve human relationships.
The toxic internet is a representation of what happens when you have a male bias in engineering, executive leadership at companies like Google and Facebook, and a lack of ethical regulation of technology, AI and the internet over the last 30 years. Of course, nobody talks about that much. Could future AI help us be more human and behave less like bots?
One way to combat hate speech online is to focus on the more positive empathetic comments and bring them to the surface. This tool could be used to improve the “collective mental health” of the Western internet.