As A.I. spreads through the business world like wildfire, the data that A.I. feeds on is becoming increasingly vital for companies.
From privacy and security to ethical issues and training bias, the span of data-related A.I. considerations is broad and getting broader. For many businesses, that means rethinking policies and practices, even if they haven’t officially adopted A.I. technology within the organization.
Companies don’t realize how much internal data is already being used within their organization for A.I. tools, said Check Point Chief Technology Officer Dorit Dor at the Fortune Brainstorm Tech conference in Deer Valley, Utah, this week. “Their data and info is already out there,” she said.
Dor joined executives from PagerDuty, Salesforce, and Signal to share their perspectives and experiences on the topic of data in the age of A.I. during a group panel at the conference.
As employees experiment with tools like ChatGPT, they’re feeding internal data to the A.I. tool. That means companies face significant issues of data leakage that could compromise both proprietary competitive information as well as personal customer data. “There is not that clean separation that traditionally enterprises would expect from secure databases,” said Clara Shih, the CEO of Salesforce’s A.I. business, about some of the A.I. tools being used by employees at various companies.
Shih explained how the large language models that power generative A.I tools require as much context as possible from a user in order to produce the most relevant and accurate responses. “If you’re not careful about how you architect it, by default the context that you give into the prompt ends up getting learned by the model itself.”
Sean Scott, the chief product development officer of PagerDuty, echoed the concern, but said it all ties back to following security best practices.
“It starts with What is your policy? What is your crown jewels? What data do you want to protect? What data to do you want to make sure is super protected?,” and then then educating employees and monitoring to ensure that policies are adhered to, Scott said.
Protecting against mystery model data
While protecting internal data from leaking is crucial, companies must also grapple with the quality of the outside data they ingest. Most of the off-the-shelf A.I. large language models are black boxes, said Signal President Meredith Whittaker. “They know what the data is, you don’t know what the data is,” she said.
Companies that want to implement those A.I. tools into their operations run a risk of getting incorrect or offensive results because of the mystery data.
“What we can do is fine tune on top of that, with some other data that might kind of move the model into a shape that fits a domain, or is more purpose-built for something, or is less offensive,” Whittaker said. “But i think we need to be clear around the lack of agency around those questions.”
Whittaker, who is an adviser to the U.S. FTC, called for more regulation to cut off the flow problematic data and to limit what goes into “the bloodstream.”
Check Point’s Dor cautioned that regulation is only a start. “Regulation would only uplift the minimum requirement, it would never get you to a really safe space,” Dor said.
In the meantime, Dor said that much of the burden of dealing with data in the A.I. era is falling on the shoulders of companies’ already overtaxed chief information security officers.
“The CISOs were exhausted before, now they have all of a sudden this mission with many elements that they don’t really know much about, like all the legal aspects.”