In Silicon Valley, some of the brightest minds believe a universal basic income (UBI) that guarantees people unrestricted cash payments will help them to survive and thrive as advanced technologies eliminate more careers as we know them, from white collar and creative jobs — lawyers, journalists, artists, software engineers — to labor roles. The idea has gained enough traction that dozens of guaranteed income programs have been started in U.S. cities since 2020.
Yet even Sam Altman, the CEO of OpenAI and one of the highest-profile proponents of UBI, doesn’t believe that it’s a complete solution. As he said during a sit-down earlier this year, “I think it is a little part of the solution. I think it’s great. I think as [advanced artificial intelligence] participates more and more in the economy, we should distribute wealth and resources much more than we have and that will be important over time. But I don’t think that’s going to solve the problem. I don’t think that’s going to give people meaning, I don’t think it means people are going to entirely stop trying to create and do new things and whatever else. So I would consider it an enabling technology, but not a plan for society.”
The question begged is what a plan for society should then look like, and computer scientist Jaron Lanier, a founder in the field of virtual reality, writes in this week’s New Yorker that “data dignity” could be one solution, if not the answer.
Here’s the basic premise: Right now, we mostly give our data for free in exchange for free services. Lanier argues that in the age of AI, must we stop doing this, that the powerful models currently working their way into society “be connected with the humans” who give them so much to ingest and learn from in the first place.
The idea is for people to “get paid for what they create, even when it is filtered and recombined” into something that’s unrecognizable.
The concept isn’t brand new, with Lanier first introducing the notion of data dignity in a 2018 Harvard Business Review piece titled, “A Blueprint for a Better Digital Society.”
As he wrote at the time with co-author and economist Glen Weyl, “[R]hetoric from the tech sector suggests a coming wave of underemployment due to artificial intelligence (AI) and automation.” But the predictions of UBI advocates “leave room for only two outcomes,” and they’re extreme, Lanier and Weyl observed. “Either there will be mass poverty despite technological advances, or much wealth will have to be taken under central, national control through a social wealth fund to provide citizens a universal basic income.”
The problem is that both “hyper-concentrate power and undermine or ignore the value of data creators,” the two wrote.
Untangle my mind
Of course, assigning people the right amount of credit for their countless contributions to everything that exists in the world is not a minor challenge (even as one can imagine AI auditing startups promising to tackle the issue). Lanier acknowledges that even data-dignity researchers can’t agree on how to disentangle everything that AI models have absorbed or how detailed an accounting should be attempted.
But he thinks — perhaps optimistically — that it could be done gradually. “The system wouldn’t necessarily account for the billions of people who have made ambient contributions to big models—those who have added to a model’s simulated competence with grammar, for example. [It] might attend only to the small number of special contributors who emerge in a given situation.” Over time, however, “more people might be included, as intermediate rights organizations—unions, guilds, professional groups, and so on—start to play a role.”
Of course, the more immediate challenge is the black-box nature of current AI tools, says Lanier, who believes that “systems must be made more transparent. We need to get better at saying what is going on inside them and why.”
While OpenAI had at least released some of its training data in previous years, it has since closed the kimono completely. Indeed, Greg Brockman told TechCrunch last month of GPT-4, its latest and most powerful large language model to date, that its training data came from a “variety of licensed, created, and publicly available data sources, which may include publicly available personal information,” but he declined to offer anything more specific.
As OpenAI stated upon GPT-4’s release, there is too much downside for the outfit in revealing more than it does. “Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.”
The same is true of every large language model currently. Google’s Bard chatbot, for example, is based on the LaMDA language model, which is trained on datasets based on internet content called Infiniset. But little else is known about it other than what Google’s research team wrote a year ago, which is that — at some period in the past — it incorporated 2.97 billion documents and 1.12 billion dialogs with 13.39 billion utterances.
Regulators are grappling with what to do. OpenAI — whose technology in particular is spreading like wildfire — is already in the crosshairs of a growing number of countries, including the Italian authority, which has blocked the use of ChatGPT. French, German, Irish, and Canadian data regulators are also investigating how it collects and uses data.
But as Margaret Mitchell, an AI researcher who was formerly Google’s AI ethics co-lead, tells the outlet Technology Review, it might be nearly impossible at this point for these companies to identify individuals’ data and remove it from their models.
As explained by the outlet: OpenAI “could have saved itself a giant headache by building in robust data record-keeping from the start, [according to Mitchell]. Instead, it is common in the AI industry to build data sets for AI models by scraping the web indiscriminately and then outsourcing the work of removing duplicates or irrelevant data points, filtering unwanted things, and fixing typos.”
How to save a life
That these tech companies may actually have limited understanding of what’s now in their models is an obvious challenge to the “data dignity” proposal of Lanier, who calls Altman a “colleague and friend” in his New Yorker piece.
Whether it renders it impossible is something only time will tell.
Certainly, there is merit in wanting to give people ownership over their work, and frustration over the issue could certainly grow as more of the world is reshaped with these new tools.
Whether or not OpenAI and others had the right to scrape the entire internet to feed its algorithms is already at the heart of numerous and wide-ranging copyright infringement lawsuits against them.
But it’s so-called data dignity could also go a long way toward preserving humans’ sanity over time, suggests Lanier in his fascinating New Yorker piece.
As he sees it, universal basic income “amounts to putting everyone on the dole in order to preserve the idea of black-box artificial intelligence.” Meanwhile, ending the “black box nature of our current AI models” would make an accounting of people’s contributions easier — making them far more likely to continue making contributions.
Importantly, Lanier adds, it could also help to “establish a new creative class instead of a new dependent class.” And which would you prefer to be a part of?