At Pew Research Center, we’re watching with great interest as new generations of large language models such as GPT, Claude and Gemini develop. These models, also known as LLMs, are machine learning tools that are trained on massive digital datasets to predict and generate humanlike language. They’re sometimes described as a subset of generative artificial intelligence.
As public-facing social scientists committed to innovation, we’re intrigued by what this fast-moving technology might add to our toolkit. As researchers, we’re committed to explaining how the public is reacting to these advances. And as information providers whose fundamental values include accuracy and methodological rigor, we’re moving with great deliberation so as not to affect the quality of our work.
In this post, we’ll share our current guidelines for the internal use of LLMs. We hope to start a dialogue with our peers and broader audiences about the best ways to use this technology as it continues to develop.
Background
LLMs aren’t new to the research community. The computational social scientists on our Data Labs team have regularly used these tools in specialized, narrowly defined research tasks for years.
What is new is that developments in scale, computational capacity and model training have led to a massive leap forward in these models’ general capabilities.
Modern LLMs can perform a much wider variety of tasks, particularly when it comes to interpreting and mimicking human communication. But even though LLMs are increasingly capable (and increasingly integrated into common software), they are just pattern recognition systems. And in their 2024 iteration, at least, they are not guaranteed to provide accurate, factual information. When they fail to produce accurate information, it can be hard to explain what went wrong because of the complexity of their internal workings and the sheer amount of data on which they were trained.
Our commitments when using LLMs
Given the above, the Center’s approach in 2024 is a version of “proceed with caution.” Here are our commitments to ourselves and to our audiences:
Our work is people-centered
Real people, not machines, answer our surveys. Collecting people’s opinions in the United States and abroad is our most important task. We do not use LLMs to create or model “synthetic” public opinion. Our survey results are based on the views reported to us by real people.
Photos on our site are of and by real people. We are not using AI to create images. Human artists create the artwork.
Humans oversee, and are responsible for, every aspect of our work. From survey questionnaires to published research reports, our work begins and ends with human experts. We do not use LLMs to decide on research topics or questionnaire items. We do not use LLMs to identify the storylines and key findings of our reports and blog posts. We believe that trained and experienced humans must guide the process of going from a raw dataset to a report that helps our audience make sense of the data.
Accuracy and rigor remain paramount
In everything we do, we prioritize accuracy and rigor, and our explorations of AI are no different. To the extent that modern technologies can help facilitate our mission to generate a trustworthy foundation of facts, we’re interested in adopting them. But only if they allow us to maintain our existing standards for quality research.
We’ll experiment as a path to innovation
Current areas of experimentation
In the production of our website. We see real potential in using this technology to help write the code needed to produce our website – a detailed, structured and often repetitive process. While we’re not currently using AI to improve users’ search and navigation experiences on our website, we see that as an area worth exploring in the near term.
In our research process. Coding assistance is also potentially useful to our researchers. For example, we may use an LLM code assistant to format or help write the code needed to analyze survey datasets. But it needs guardrails. At the Center, researchers who are fluent in a coding language can access an LLM coding assistant only if they have a full, human-run code check process.
We’ll also continue experimenting with using LLMs to analyze textual data, such as coding open-ended survey responses into categories or scraping websites for key data. This work has been and will always be overseen by Center staff. We will continue to be transparent and acknowledge in our report methodologies whenever we have used these tools.
In the final stages of our editorial functions. Many widely available tools already use this technology to clean up grammar, punctuation, etc. And we are using these tools, too, though a human verifies our final copy. In our judgment, this level of use does not require external labeling.
Possible future areas of experimentation
We’re watching developments in the publishing industry carefully. Some interesting uses of AI include:
Creating derivative products. We see potential for leveraging LLMs to quickly generate derivative products that would create new access points to our content for a wider array of consumers. This could include drafting social media posts in a variety of styles for a variety of platforms. Or it could mean creating a first-draft, thematic summarization for our topic index landing pages. As of this writing, however, we are not ready to cross this bridge. Our current internal guidance is that any external-facing products need to be human-authored, not just overseen.
Summarizing research in search results. Currently, the search function on the Pew Research Center website delivers a list of links. We hope at some future point to incorporate a “smart search” overlay that would deliver a more pointed summary of our data to interested users. We’re following developments in the accuracy of model results so we can experiment further at the right moment.
We’ll be transparent
For currently approved uses:
- If we use an LLM in any aspect of our research process, we will note that in the published methodology section.
- Using an LLM to make minor grammatical, spelling or reading grade level changes is not considered a meaningful use and will not be cited.
- As made clear above, our developers are already using human-supervised AI to write the code that creates our website.
We’d love to hear from you about your thoughts, hopes and concerns on this topic. We already know it’s one we’ll be revisiting this year and beyond.