Publications
Publications in reversed chronological order. * denotes equal contribution.
See also my Google Scholar profile.
2024
- Does Geo-co-location Matter? A Case Study of Public Health Conversations during COVID-19Paiheng Xu, Louiqa Raschid , and Vanessa Frias-MartinezarXiv preprint arXiv:2405.17710, May 2024
Social media platforms like Twitter (now X) have been pivotal in information dissemination and public engagement, especially during COVID-19. A key goal for public health experts was to encourage prosocial behavior that could impact local outcomes such as masking and social distancing. Given the importance of local news and guidance during COVID-19, the objective of our research is to analyze the effect of localized engagement, on social media conversations. This study examines the impact of geographic co-location, as a proxy for localized engagement between public health experts (PHEs) and the public, on social media. We analyze a Twitter conversation dataset from January 2020 to November 2021, comprising over 19 K tweets from nearly five hundred PHEs, along with approximately 800 K replies from 350 K participants. Our findings reveal that geo-co-location is associated with higher engagement rates, especially in conversations on topics including masking, lockdowns, and education, and in conversations with academic and medical professionals. Lexical features associated with emotion and personal experiences were more common in geo-co-located contexts. This research provides insights into how geographic co-location influences social media engagement and can inform strategies to improve public health messaging.
- Large Language Models and Causal Inference in Collaboration: A Comprehensive SurveyXiaoyu Liu* , Paiheng Xu*, Junda Wu , Jiaxin Yuan , Yifan Yang , Yuhang Zhou , and 7 more authorsarXiv preprint arXiv:2403.09606, Mar 2024
Causal inference has shown potential in enhancing the predictive accuracy, fairness, robustness, and explainability of Natural Language Processing (NLP) models by capturing causal relationships among variables. The emergence of generative Large Language Models (LLMs) has significantly impacted various NLP domains, particularly through their advanced reasoning capabilities. This survey focuses on evaluating and improving LLMs from a causal view in the following areas: understanding and improving the LLMs’ reasoning capacity, addressing fairness and safety issues in LLMs, complementing LLMs with explanations, and handling multimodality. Meanwhile, LLMs’ strong reasoning capacities can in turn contribute to the field of causal inference by aiding causal relationship discovery and causal effect estimations. This review explores the interplay between causal inference frameworks and LLMs from both perspectives, emphasizing their collective potential to further the development of more advanced and equitable artificial intelligence systems.
- The Promises and Pitfalls of Using Language Models to Measure Instruction Quality in EducationPaiheng Xu, Jing Liu , Nathan Jones , Julie Cohen , and Wei AiIn NAACL , Jun 2024
Assessing instruction quality is a fundamental component of any improvement efforts in the education system. However, traditional manual assessments are expensive, subjective, and heavily dependent on observers’ expertise and idiosyncratic factors, preventing teachers from getting timely and frequent feedback. Different from prior research that mostly focuses on low-inference instructional practices on a singular basis, this paper presents the first study that leverages Natural Language Processing (NLP) techniques to assess multiple high-inference instructional practices in two distinct educational settings: in-person K-12 classrooms and simulated performance tasks for pre-service teachers. This is also the first study that applies NLP to measure a teaching practice that is widely acknowledged to be particularly effective for students with special needs. We confront two challenges inherent in NLP-based instructional analysis, including noisy and long input data and highly skewed distributions of human ratings. Our results suggest that pretrained Language Models (PLMs) demonstrate performances comparable to the agreement level of human raters for variables that are more discrete and require lower inference, but their efficacy diminishes with more complex teaching practices. Interestingly, using only teachers’ utterances as input yields strong results for student-centered variables, alleviating common concerns over the difficulty of collecting and transcribing high-quality student speech data in in-person teaching settings. Our findings highlight both the potential and the limitations of current NLP techniques in the education domain, opening avenues for further exploration.
- Explore Spurious Correlations at the Concept Level in Language Models for Text ClassificationYuhang Zhou , Paiheng Xu, Xiaoyu Liu , Bang An , Wei Ai , and Furong HuangIn ACL , Aug 2024
Language models (LMs) have achieved notable success in numerous NLP tasks, employing both fine-tuning and in-context learning (ICL) methods. While language models demonstrate exceptional performance, they face robustness challenges due to spurious correlations arising from imbalanced label distributions in training data or ICL exemplars. Previous research has primarily concentrated on word, phrase, and syntax features, neglecting the concept level, often due to the absence of concept labels and difficulty in identifying conceptual content in input texts. This paper introduces two main contributions. First, we employ ChatGPT to assign concept labels to texts, assessing concept bias in models during fine-tuning or ICL on test data. We find that LMs, when encountering spurious correlations between a concept and a label in training or prompts, resort to shortcuts for predictions. Second, we introduce a data rebalancing technique that incorporates ChatGPT-generated counterfactual data, thereby balancing label distribution and mitigating spurious correlations. Our method’s efficacy, surpassing traditional token removal approaches, is validated through extensive testing.
- Emojis decoded: Leveraging chatgpt for enhanced understanding in social media communicationsYuhang Zhou , Paiheng Xu, Xiyao Wang , Xuan Lu , Ge Gao , and Wei AiarXiv preprint arXiv:2402.01681, Aug 2024
Emojis, which encapsulate semantics beyond mere words or phrases, have become prevalent in social network communications. This has spurred increasing scholarly interest in exploring their attributes and functionalities. However, emoji-related research and application face two primary challenges. First, researchers typically rely on crowd-sourcing to annotate emojis in order to understand their sentiments, usage intentions, and semantic meanings. Second, subjective interpretations by users can often lead to misunderstandings of emojis and cause the communication barrier. Large Language Models (LLMs) have achieved significant success in various annotation tasks, with ChatGPT demonstrating expertise across multiple domains. In our study, we assess ChatGPT’s effectiveness in handling previously annotated and downstream tasks. Our objective is to validate the hypothesis that ChatGPT can serve as a viable alternative to human annotators in emoji research and that its ability to explain emoji meanings can enhance clarity and transparency in online communications. Our findings indicate that ChatGPT has extensive knowledge of emojis. It is adept at elucidating the meaning of emojis across various application scenarios and demonstrates the potential to replace human annotators in a range of tasks.
- Twitter social mobility data reveal demographic variations in social distancing practices during the COVID-19 pandemicPaiheng Xu, David A Broniatowski , and Mark DredzeScientific reports, Jan 2024
The COVID-19 pandemic demonstrated the importance of social distancing practices to stem the spread of the virus. However, compliance with public health guidelines was mixed. Understanding what factors are associated with differences in compliance can improve public health messaging since messages could be targeted and tailored to different population segments. We utilize Twitter data on social mobility during COVID-19 to reveal which populations practiced social distancing and what factors correlated with this practice. We analyze correlations between demographic and political affiliation with reductions in physical mobility measured by public geolocation tweets. We find significant differences in mobility reduction between these groups in the United States. We observe that males, Asian and Latinx individuals, older individuals, Democrats, and people from higher population density states exhibited larger reductions in movement. Furthermore, our study also unveils meaningful insights into the interactions between different groups. We hope these findings will provide evidence to support public health policy-making.
2023
- GFairHint: improving individual fairness for graph neural networks via fairness hintPaiheng Xu*, Yuhang Zhou* , Bang An , Wei Ai , and Furong HuangarXiv preprint arXiv:2305.15622, May 2023
Given the growing concerns about fairness in machine learning and the impressive performance of Graph Neural Networks (GNNs) on graph data learning, algorithmic fairness in GNNs has attracted significant attention. While many existing studies improve fairness at the group level, only a few works promote individual fairness, which renders similar outcomes for similar individuals. A desirable framework that promotes individual fairness should (1) balance between fairness and performance, (2) accommodate two commonly-used individual similarity measures (externally annotated and computed from input features), (3) generalize across various GNN models, and (4) be computationally efficient. Unfortunately, none of the prior work achieves all the desirables. In this work, we propose a novel method, GFairHint, which promotes individual fairness in GNNs and achieves all aforementioned desirables. GFairHint learns fairness representations through an auxiliary link prediction task, and then concatenates the representations with the learned node embeddings in original GNNs as a "fairness hint". Through extensive experimental investigations on five real-world graph datasets under three prevalent GNN models covering both individual similarity measures above, GFairHint achieves the best fairness results in almost all combinations of datasets with various backbone models, while generating comparable utility results, with much less computational cost compared to the previous state-of-the-art (SoTA) method.
2022
- A Machine Learning Approach For Discovering Tobacco Brands, Products, and Manufacturers in the United StatesAdam Poliak , Paiheng Xu, Eric Leas , Mario Navarro , Stephanie Pitts , Andie Malterud , and 2 more authorsIn Annual Meeting of the Society for Research on Nicotine and Tobacco , May 2022
2021
- Using Noisy Self-Reports to Predict Twitter User DemographicsZach Wood-Doughty* , Paiheng Xu*, Xiao Liu , and Mark DredzeIn Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media , Jun 2021
Computational social science studies often contextualize content analysis within standard demographics. Since demographics are unavailable on many social media platforms (e.g. Twitter) numerous studies have inferred demographics automatically. Despite many studies presenting proof of concept inference of race and ethnicity, training of practical systems remains elusive since there are few annotated datasets. Existing datasets are small, inaccurate, or fail to cover the four most common racial and ethnic groups in the United States. We present a method to identify self-reports of race and ethnicity from Twitter profile descriptions. Despite errors inherent in automated supervision, we produce models with good performance when measured on gold standard self-report survey data. The result is a reproducible method for creating large-scale training resources for race and ethnicity.
2020
2019
- On predictability of time seriesPaiheng Xu, Likang Yin , Zhongtao Yue , and Tao ZhouPhysica A: Statistical Mechanics and its Applications, Feb 2019
The method to estimate the predictability of human mobility was proposed in Song et al. (2010), which is extensively followed in exploring the predictability of disparate time series. However, the ambiguous description in the original paper leads to some misunderstandings, including the inconsistent logarithm bases in the entropy estimator and the entropy-predictability-conversion equation, as well as the details in the calculation of the Lempel–Ziv estimator, which further results in remarkably overestimated predictability. This paper demonstrates the degree of overestimation by four different types of theoretically generated time series and an empirical data set, and shows the intrinsic deviation of the Lempel–Ziv estimator for highly random time series. This work provides a clear picture on this issue and thus helps researchers in correctly estimating the predictability of time series.
2018
- A novel visibility graph transformation of time series into weighted networksPaiheng Xu, Rong Zhang , and Yong DengChaos, Solitons & Fractals, Nov 2018
Analyzing time series from the perspective of complex network has interested many scientists. In this paper, based on visibility graph theory a novel method of constructing weighted complex network from time series is proposed. The first step is to determine the weights of vertices in time series, which linearly combines the weights generated by induced ordered averaging aggregation operator (IOWA) and visibility graph aggregation operator (VGA). Then, two strategies, averaging strategy and gravity strategy, are proposed to construct weighted network. To testify the validity of proposed method, an artificial case is adopted, in which link prediction is used to evaluate the performance of the weighted network. It is shown that the weighted network constructed by proposed method greatly outperforms the unweighted network obtained by traditional visibility graph theory.