How government control of media influences large language models
EGB invites to a research seminar on government control of media and large language models by Margaret E. Roberts, Professor, University of California, San Diego
About this event
Abstract:
Millions of people around the world query large language models for information. While several studies have compellingly documented the persuasive potential of these models, there is limited evidence of who or what influences the models themselves, leading to a flurry of concerns about which companies and governments build and regulate the models. We show through six studies that government control of the media already influences the output of large language models via their training data. To understand the specific mechanism of how government control can influence LLMs, we begin with a case study of China's media. We demonstrate that media scripted and coordinated by the Chinese state appears in large language model training datasets. To evaluate the plausible effect of this inclusion, we use an open-weight model to show that additional pretraining on Chinese state-coordinated media generates more positive answers to prompts about Chinese political institutions and leaders. We link this phenomenon to commercial models through two audit studies demonstrating that prompting models in Chinese generates more positive responses about China's institutions and leaders than the same queries in English. China's media system is just one specific case of government control. We use a cross-national audit to provide evidence that the influence of media control on LLM outputs extends beyond China. We show that the languages of countries with lower media freedom exhibit a stronger pro-regime valence than those with higher media freedom. The combination of influence and persuasive potential suggests the troubling conclusion that states and powerful institutions have increased strategic incentives to leverage media control in the hopes of shaping large language model output.
Short Bio:
Margaret Roberts is a Professor in the Department of Political Science at the University of California, San Diego where she holds a Chancellor's Endowed Chair. She co-directs the China Data Lab at the 21st Century China Center and is an affiliate at the UC Institute on Global Conflict and Cooperation. Her research interests lie in the intersection of new technologies and digital politics, with a specific focus the politics of artificial intelligence, online censorship and propaganda, and science and innovation.
Roberts’ first book, Censored: Distraction and Diversion Inside China's Great Firewall, published by Princeton University Press in 2018, was listed as one of the Foreign Affairs Best Books of 2018, was honored with the Goldsmith Book Award, the Best Book Award in the Human Rights Section and the Best Book Award in the Information Technology and Politics Section of the American Political Science Association. Her second book, Text as Data: A New Framework for Machine Learning in the Social Sciences (published with Justin Grimmer and Brandon Stewart) won the American Sociological Association’s Methodology Section’s Outstanding Publication Award in 2025