publications | Gauri Kambhatla

2025

Preprint

Beyond Sociodemographic Prompting: Using Supervision to Align LLMs with Human Response Distributions

Gauri Kambhatla, Sanjana Gautam, Angela Zhang, Alex Liu, Ravi Srinivasan, Junyi Jessy Li, and Matthew Lease

Arxiv preprint, 2025

Abs PDF Code Data

The ability to accurately predict how different population groups would answer subjective questions would have great value. In this work, we show that use of relatively simple supervision can greatly improve language model alignment with diverse population groups, as measured over three datasets spanning various topics. Beyond evaluating average performance, we also report how alignment varies across specific groups. The simplicity and generality of our approach promotes easy adoption, while our broad findings provide useful guidance for when to use or not use our approach in practice. By conducting evaluation over many LLMs and prompting strategies, along with open-sourcing our work, we provide a useful benchmark to stimulate future research.
Preprint

Measuring diversity of synthetic prompts and data generated with fine-grained persona prompting

Gauri Kambhatla, Chantal Shaib, and Venkata Govindarajan

Arxiv preprint, 2025

Abs PDF

Fine-grained personas have recently been used for generating ’diverse’ synthetic data for pre-training and supervised fine-tuning of Large Language Models (LLMs). In this work, we measure the diversity of persona-driven synthetically generated prompts and responses with a suite of lexical diversity and redundancy metrics. Firstly, we find that synthetic prompts/instructions are significantly less diverse than human-written ones. Next, we sample responses from LLMs of different sizes with fine-grained and coarse persona descriptions to investigate how much fine-grained detail in persona descriptions contribute to generated text diversity. We find that while persona-prompting does improve lexical diversity (especially with larger models), fine-grained detail in personas doesn’t increase diversity noticeably.

2024

EMNLP Findings

Promoting Constructive Deliberation: Reframing for Receptiveness

Gauri Kambhatla, Matthew Lease, and Ashwin Rajadesingan

Findings of the Conference on Empirical Methods in Natural Language Processing, 2024

Abs PDF Data Poster

To promote constructive discussion of controversial topics online, we propose automatic reframing of disagreeing responses to signal receptiveness while preserving meaning. Drawing on research from psychology, communications, and linguistics, we identify six strategies for reframing. We automatically reframe replies according to each strategy, using a dataset of Reddit comments and replies. Through human-centered experiments, we find that the replies generated with our framework are perceived to be significantly more receptive than the original replies, as well as a generic receptiveness baseline. We analyze and discuss the implications of our results and highlight applications to content moderation. Overall, we illustrate how transforming receptiveness, a particular social science construct, into a computational framework, can make LLM generations more aligned with human perceptions.

2023

ACL Findings

Quantifying Train-Evaluation Overlap with Nearest Neighbors

Gauri Kambhatla, Thuy Nguyen, and Eunsol Choi

Findings of the Association for Computational Linguistics, 2023

Abs PDF Code

Characterizing benchmark datasets is crucial to interpreting model performance. In this work, we study train-evaluation overlap as a measure of an individual dataset’s adequacy to evaluate model generalization over a wide range of datasets. We quantify the overlap with a simple novel metric based on a nearest neighbors approach between the training and evaluation sets. We identify nearest training examples for each evaluation example by mapping instances with generic and task-specific embedding methods. Our study on eleven classification and extractive QA tasks reveals a wide range of train-evaluation overlap, and we show that the data collection method of the dataset and the difficulty of the task may play a role in the amount of overlap. Lastly, we use our nearest neighbor analysis to identify challenging or potentially mislabeled examples. Our analysis quantifies train-evaluation overlap, providing insights for constructing datasets to study generalization.

2022

FAccT

Surfacing Racial Stereotypes Through Identity Portrayal

Gauri Kambhatla, Ian Stewart, and Rada Mihalcea

ACM Conference on Fairness, Accountability, and Transparency, 2022

Abs PDF Data

People express racial stereotypes through conversations with others, increasingly in a digital format; as a result, the ability to computationally identify racial stereotypes could be beneficial to help mitigate some of the harmful effects of stereotyping. In this work, we seek to better understand how we can computationally surface racial stereotypes in text by identifying linguistic features associated with differences in racial identity portrayal, focused on two races (Black and White). We collect novel data of individuals’ self-presentation via crowdsourcing, where each crowdworker answers a set of prompts from their own perspective (real identity), and from the perspective of another racial identity (portrayed identity), keeping the gender constant. We use these responses as a dataset to identify stereotypes. Through a series of experiments based on classifications between real and portrayed identities, we show that generalizations and stereotypes appear to be more prevalent amongst white participants than black participants. Through analyses of predictive words and word usage patterns, we find that some of the most predictive features of an author portraying a different racial identity are known stereotypes, and reveal how people of different identities see themselves and others.

2021

EvoMUSART

Chord Embeddings: Analyzing What They Capture and Their Role for Next Chord Prediction and Artist Attribute Prediction

Allison Lahnala, Gauri Kambhatla, Jiajun Peng, Matthew Whitehead, Gillian Minnehan, Eric Guldan, Jonathan K. Kummerfeld, Anıl Çamcı, and Rada Mihalcea

Proceedings of Computational Intelligence in Music, Sound, Art and Design - 10th International Conference, EvoMUSART, 2021

Abs PDF Data

Natural language processing methods have been applied in a variety of music studies, drawing the connection between music and language. In this paper, we expand those approaches by investigating chord embeddings, which we apply in two case studies to address two key questions: (1) what musical information do chord embeddings capture?; and (2) how might musical applications benefit from them? In our analysis, we show that they capture similarities between chords that adhere to important relationships described in music theory. In the first case study, we demonstrate that using chord embeddings in a next chord prediction task yields predictions that more closely match those by experienced musicians. In the second case study, we show the potential benefits of using the representations in tasks related to musical stylometrics.
TOCHI

PLIERS: A Process that Integrates User-Centered Methods into Programming Language Design

Michael Coblenz, Gauri Kambhatla, Paulette Koronkevich, Jenna L. Wise, Celeste Barnaby, Joshua Sunshine, Jonathan Aldrich, and Brad A. Myers

ACM Transactions on Computer-Human Interaction, 2021

Abs PDF

Programming language design requires making many usability-related design decisions. However, existing HCI methods can be impractical to apply to programming languages: languages have high iteration costs, programmers require significant learning time, and user performance has high variance. To address these problems, we adapted both formative and summative HCI methods to make them more suitable for programming language design. We integrated these methods into a new process, PLIERS, for designing programming languages in a user-centered way. We assessed PLIERS by using it to design two new programming languages. Glacier extends Java to enable programmers to express immutability properties effectively and easily. Obsidian is a language for blockchains that includes verification of critical safety properties. Empirical studies showed that the PLIERS process resulted in languages that could be used effectively by many programmers and revealed additional opportunities for language improvement.

2019

PLATEAU

A Pilot Study of the Safety and Usability of the Obsidian Blockchain Programming Language

Gauri Kambhatla, Michael Coblenz, Reed Oei, Joshua Sunshine, Brad Myers, and Jonathan Aldrich

PLATEAU Workshop, 2019

Abs PDF

Although blockchains have been proposed for building systems that execute critical transactions, security vulnerabilities have plagued programs that are deployed on blockchain systems. The programming language Obsidian was developed with the purpose of statically preventing some of the more common of these security risks, specifically the loss of resources and improper manipulation of objects. The question then is whether Obsidian’s novel features impact the usability of the language. In this paper, we begin to evaluate Obsidian with respect to usability, and develop materials for a quantitative user study through a sequence of pilot studies. Specifically, our goal was to assess a) potential usability problems of Obsidian, b) the effectiveness of a tutorial for participants to learn the language, and c) the design of programming tasks to evaluate performance using the language. Our preliminary results tentatively suggest that the complexity of Obsidian’s features do not hinder usability, although these results will be validated in the quantitative study. We also observed the following factors as being important in a given programmer’s ability to learn Obsidian: a) integrating very frequent opportunities for practice of the material - e.g., after less than a page of material at a time, and b) previous programming experience and self-efficacy.