OpenAI is funding academic research into algorithms that can predict humans' moral judgements.
In a filing with the IRS, OpenAI Inc., OpenAI's nonprofit org, disclosed that it awarded a grant to Duke University researchers for a project titled "Research AI Morality." Contacted for comment, an OpenAI spokesperson pointed to a press release indicating the award is part of a larger, three-year, $1 million grant to Duke professors studying "making moral AI."
Little is public about this "morality" research OpenAI is funding, other than the fact that the grant ends in 2025. The study's principal investigator, Walter Sinnott-Armstrong, a practical ethics professor at Duke, told TechCrunch via email that he "will not be able to talk" about the work.
Sinnott-Armstrong and the project's co-investigator, Jana Borg, have produced several studies -- and a book -- about AI's potential to serve as a "moral GPS" to help humans make better judgements. As part of larger teams, they've created a "morally-aligned" algorithm to help decide who receives kidney donations, and studied in which scenarios people would prefer that AI make moral decisions.
According to the press release, the goal of the OpenAI-funded work is to train algorithms to "predict human moral judgements" in scenarios involving conflicts "among morally relevant features in medicine, law, and business."
But it's far from clear that a concept as nuanced as morality is within reach of today's tech.
In 2021, the nonprofit Allen Institute for AI built a tool called Ask Delphi that was meant to give ethically sound recommendations. It judged basic moral dilemmas well enough -- the bot "knew" that cheating on an exam was wrong, for example. But slightly rephrasing and rewording questions was enough to get Delphi to approve of pretty much anything, including smothering infants.
The reason has to do with how modern AI systems work.
Machine learning models are statistical machines. Trained on a lot of examples from all over the web, they learn the patterns in those examples to make predictions, like that the phrase "to whom" often precedes "it may concern."
AI doesn't have an appreciation for ethical concepts, nor a grasp on the reasoning and emotion that play into moral decision-making. That's why AI tends to parrot the values of Western, educated, and industrialized nations -- the web, and thus AI's training data, is dominated by articles endorsing those viewpoints.
Unsurprisingly, many people's values aren't expressed in the answers AI gives, particularly if those people aren't contributing to the AI's training sets by posting online. And AI internalizes a range of biases beyond a Western bent. Delphi said that being straight is more "morally acceptable" than being gay.
The challenge before OpenAI -- and the researchers it's backing -- is made all the more intractable by the inherent subjectivity of morality. Philosophers have been debating the merits of various ethical theories for thousands of years, and there's no universally applicable framework in sight.
Claude favors Kantianism (i.e. focusing on absolute moral rules), while ChatGPT leans every-so-slightly utilitarian (prioritizing the greatest good for the greatest number of people). Is one superior to the other? It depends on who you ask.
An algorithm to predict humans' moral judgements will have to take all this into account. That's a very high bar to clear -- assuming such an algorithm is possible in the first place.