- ACL FindingsQuantifying Train-Evaluation Overlap with Nearest NeighborsGauri Kambhatla, Thuy Nguyen, and Eunsol ChoiFindings of the Association for Computational Linguistics 2023
Characterizing benchmark datasets is crucial to interpreting model performance. In this work, we study train-evaluation overlap as a measure of an individual dataset’s adequacy to evaluate model generalization over a wide range of datasets. We quantify the overlap with a simple novel metric based on a nearest neighbors approach between the training and evaluation sets. We identify nearest training examples for each evaluation example by mapping instances with generic and task-specific embedding methods. Our study on eleven classification and extractive QA tasks reveals a wide range of train-evaluation overlap, and we show that the data collection method of the dataset and the difficulty of the task may play a role in the amount of overlap. Lastly, we use our nearest neighbor analysis to identify challenging or potentially mislabeled examples. Our analysis quantifies train-evaluation overlap, providing insights for constructing datasets to study generalization.
- FAccTSurfacing Racial Stereotypes Through Identity PortrayalGauri Kambhatla, Ian Stewart, and Rada MihalceaACM Conference on Fairness, Accountability, and Transparency 2022
People express racial stereotypes through conversations with others, increasingly in a digital format; as a result, the ability to computationally identify racial stereotypes could be beneficial to help mitigate some of the harmful effects of stereotyping. In this work, we seek to better understand how we can computationally surface racial stereotypes in text by identifying linguistic features associated with differences in racial identity portrayal, focused on two races (Black and White). We collect novel data of individuals’ self-presentation via crowdsourcing, where each crowdworker answers a set of prompts from their own perspective (real identity), and from the perspective of another racial identity (portrayed identity), keeping the gender constant. We use these responses as a dataset to identify stereotypes. Through a series of experiments based on classifications between real and portrayed identities, we show that generalizations and stereotypes appear to be more prevalent amongst white participants than black participants. Through analyses of predictive words and word usage patterns, we find that some of the most predictive features of an author portraying a different racial identity are known stereotypes, and reveal how people of different identities see themselves and others.
- EvoMUSARTChord Embeddings: Analyzing What They Capture and Their Role for Next Chord Prediction and Artist Attribute PredictionAllison Lahnala, Gauri Kambhatla, Jiajun Peng, Matthew Whitehead, Gillian Minnehan, Eric Guldan, Jonathan K. Kummerfeld, Anıl Çamcı, and Rada MihalceaProceedings of Computational Intelligence in Music, Sound, Art and Design - 10th International Conference, EvoMUSART 2021
Natural language processing methods have been applied in a variety of music studies, drawing the connection between music and language. In this paper, we expand those approaches by investigating chord embeddings, which we apply in two case studies to address two key questions: (1) what musical information do chord embeddings capture?; and (2) how might musical applications benefit from them? In our analysis, we show that they capture similarities between chords that adhere to important relationships described in music theory. In the first case study, we demonstrate that using chord embeddings in a next chord prediction task yields predictions that more closely match those by experienced musicians. In the second case study, we show the potential benefits of using the representations in tasks related to musical stylometrics.
- TOCHIPLIERS: A Process that Integrates User-Centered Methods into Programming Language DesignMichael Coblenz, Gauri Kambhatla, Paulette Koronkevich, Jenna L. Wise, Celeste Barnaby, Joshua Sunshine, Jonathan Aldrich, and Brad A. MyersACM Transactions on Computer-Human Interaction 2021
Programming language design requires making many usability-related design decisions. However, existing HCI methods can be impractical to apply to programming languages: languages have high iteration costs, programmers require significant learning time, and user performance has high variance. To address these problems, we adapted both formative and summative HCI methods to make them more suitable for programming language design. We integrated these methods into a new process, PLIERS, for designing programming languages in a user-centered way. We assessed PLIERS by using it to design two new programming languages. Glacier extends Java to enable programmers to express immutability properties effectively and easily. Obsidian is a language for blockchains that includes verification of critical safety properties. Empirical studies showed that the PLIERS process resulted in languages that could be used effectively by many programmers and revealed additional opportunities for language improvement.
- PLATEAUA Pilot Study of the Safety and Usability of the Obsidian Blockchain Programming LanguageGauri Kambhatla, Michael Coblenz, Reed Oei, Joshua Sunshine, Brad Myers, and Jonathan AldrichPLATEAU Workshop 2019
Although blockchains have been proposed for building systems that execute critical transactions, security vulnerabilities have plagued programs that are deployed on blockchain systems. The programming language Obsidian was developed with the purpose of statically preventing some of the more common of these security risks, specifically the loss of resources and improper manipulation of objects. The question then is whether Obsidian’s novel features impact the usability of the language. In this paper, we begin to evaluate Obsidian with respect to usability, and develop materials for a quantitative user study through a sequence of pilot studies. Specifically, our goal was to assess a) potential usability problems of Obsidian, b) the effectiveness of a tutorial for participants to learn the language, and c) the design of programming tasks to evaluate performance using the language. Our preliminary results tentatively suggest that the complexity of Obsidian’s features do not hinder usability, although these results will be validated in the quantitative study. We also observed the following factors as being important in a given programmer’s ability to learn Obsidian: a) integrating very frequent opportunities for practice of the material - e.g., after less than a page of material at a time, and b) previous programming experience and self-efficacy
- ACL FindingsQuantifying Train-Evaluation Overlap with Nearest NeighborsFindings of the Association for Computational Linguistics 2023
- FAccTSurfacing Racial Stereotypes Through Identity PortrayalACM Conference on Fairness, Accountability, and Transparency 2022
- EvoMUSARTChord Embeddings: Analyzing What They Capture and Their Role for Next Chord Prediction and Artist Attribute PredictionProceedings of Computational Intelligence in Music, Sound, Art and Design - 10th International Conference, EvoMUSART 2021
- TOCHIPLIERS: A Process that Integrates User-Centered Methods into Programming Language DesignACM Transactions on Computer-Human Interaction 2021
- PLATEAUA Pilot Study of the Safety and Usability of the Obsidian Blockchain Programming LanguagePLATEAU Workshop 2019