See also my Google Scholar profile.
2024
-
Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey
Xiaoyu Liu* , Paiheng Xu*, Junda Wu , Jiaxin Yuan , Yifan Yang , Yuhang Zhou , and 7 more authors
arXiv preprint arXiv:2403.09606, 2024
Causal inference has shown potential in enhancing the predictive accuracy, fairness, robustness, and explainability of Natural Language Processing (NLP) models by capturing causal relationships among variables. The emergence of generative Large Language Models (LLMs) has significantly impacted various NLP domains, particularly through their advanced reasoning capabilities. This survey focuses on evaluating and improving LLMs from a causal view in the following areas: understanding and improving the LLMs’ reasoning capacity, addressing fairness and safety issues in LLMs, complementing LLMs with explanations, and handling multimodality. Meanwhile, LLMs’ strong reasoning capacities can in turn contribute to the field of causal inference by aiding causal relationship discovery and causal effect estimations. This review explores the interplay between causal inference frameworks and LLMs from both perspectives, emphasizing their collective potential to further the development of more advanced and equitable artificial intelligence systems.
-
The Promises and Pitfalls of Using Language Models to Measure Instruction Quality in Education
Paiheng Xu, Jing Liu , Nathan D Jones , Julie Cohen , and Wei Ai
In To appear in NAACL-HLT , 2024
Assessing instruction quality is a fundamental component of any improvement efforts in the education system. However, traditional manual assessments are expensive, subjective, and heavily dependent on observers’ expertise and idiosyncratic factors, preventing teachers from getting timely and frequent feedback. Different from prior research that focus on low-inference instruction practices, this paper presents the first study that leverages Natural Language Processing (NLP) techniques to assess multiple high-inference instruction practices in two distinct educational settings: in-person K-12 classrooms and simulated performance tasks for pre-service teachers. This is also the first study that applies NLP to measure a teaching practice that has been demonstrated to be particularly effective for students with special needs. We confront two challenges inherent in NLP-based instruction analysis, including noisy and long input data and highly skewed distribution of human ratings. Our results suggest that pretrained Language Models (PLMs) demonstrate performances comparable to the agreement level of human raters for variables that are more discrete and require lower inference, but their efficacy diminishes with more complex teaching practices. Interestingly, only using teachers’ utterances as input yields strong results for student-centered variables, alleviating common concerns over the difficulty of collecting and transcribing high-quality student speech data in in-person teaching settings. Our findings highlight both the potential and the limitations of current NLP techniques in the education domain, opening avenues for further exploration.
-
Emojis decoded: Leveraging chatgpt for enhanced understanding in social media communications
Yuhang Zhou , Paiheng Xu, Xiyao Wang , Xuan Lu , Ge Gao , and Wei Ai
arXiv preprint arXiv:2402.01681, 2024
Emojis, which encapsulate semantics beyond mere words or phrases, have become prevalent in social network communications. This has spurred increasing scholarly interest in exploring their attributes and functionalities. However, emoji-related research and application face two primary challenges. First, researchers typically rely on crowd-sourcing to annotate emojis in order to understand their sentiments, usage intentions, and semantic meanings. Second, subjective interpretations by users can often lead to misunderstandings of emojis and cause the communication barrier. Large Language Models (LLMs) have achieved significant success in various annotation tasks, with ChatGPT demonstrating expertise across multiple domains. In our study, we assess ChatGPT’s effectiveness in handling previously annotated and downstream tasks. Our objective is to validate the hypothesis that ChatGPT can serve as a viable alternative to human annotators in emoji research and that its ability to explain emoji meanings can enhance clarity and transparency in online communications. Our findings indicate that ChatGPT has extensive knowledge of emojis. It is adept at elucidating the meaning of emojis across various application scenarios and demonstrates the potential to replace human annotators in a range of tasks.
-
2023
-
GFairHint: improving individual fairness for graph neural networks via fairness hint
Paiheng Xu*, Yuhang Zhou* , Bang An , Wei Ai , and Furong Huang
arXiv preprint arXiv:2305.15622, 2023
Given the growing concerns about fairness in machine learning and the impressive performance of Graph Neural Networks (GNNs) on graph data learning, algorithmic fairness in GNNs has attracted significant attention. While many existing studies improve fairness at the group level, only a few works promote individual fairness, which renders similar outcomes for similar individuals. A desirable framework that promotes individual fairness should (1) balance between fairness and performance, (2) accommodate two commonly-used individual similarity measures (externally annotated and computed from input features), (3) generalize across various GNN models, and (4) be computationally efficient. Unfortunately, none of the prior work achieves all the desirables. In this work, we propose a novel method, GFairHint, which promotes individual fairness in GNNs and achieves all aforementioned desirables. GFairHint learns fairness representations through an auxiliary link prediction task, and then concatenates the representations with the learned node embeddings in original GNNs as a "fairness hint". Through extensive experimental investigations on five real-world graph datasets under three prevalent GNN models covering both individual similarity measures above, GFairHint achieves the best fairness results in almost all combinations of datasets with various backbone models, while generating comparable utility results, with much less computational cost compared to the previous state-of-the-art (SoTA) method.
-
Explore Spurious Correlations at the Concept Level in Language Models for Text Classification
Yuhang Zhou , Paiheng Xu, Xiaoyu Liu , Bang An , Wei Ai , and Furong Huang
arXiv preprint arXiv:2311.08648, 2023
Language models (LMs) have achieved notable success in numerous NLP tasks, employing both fine-tuning and in-context learning (ICL) methods. While language models demonstrate exceptional performance, they face robustness challenges due to spurious correlations arising from imbalanced label distributions in training data or ICL exemplars. Previous research has primarily concentrated on word, phrase, and syntax features, neglecting the concept level, often due to the absence of concept labels and difficulty in identifying conceptual content in input texts. This paper introduces two main contributions. First, we employ ChatGPT to assign concept labels to texts, assessing concept bias in models during fine-tuning or ICL on test data. We find that LMs, when encountering spurious correlations between a concept and a label in training or prompts, resort to shortcuts for predictions. Second, we introduce a data rebalancing technique that incorporates ChatGPT-generated counterfactual data, thereby balancing label distribution and mitigating spurious correlations. Our method’s efficacy, surpassing traditional token removal approaches, is validated through extensive testing.
2022
-
A Machine Learning Approach For Discovering Tobacco Brands, Products, and Manufacturers in the United States
Adam Poliak , Paiheng Xu, Eric Leas , Mario Navarro , Stephanie Pitts , Andie Malterud , and 2 more authors
In Annual Meeting of the Society for Research on Nicotine and Tobacco , 2022
2021
-
Using Noisy Self-Reports to Predict Twitter User Demographics
Zach Wood-Doughty* , Paiheng Xu*, Xiao Liu , and Mark Dredze
In Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media , 2021
Computational social science studies often contextualize content analysis within standard demographics. Since demographics are unavailable on many social media platforms (e.g. Twitter) numerous studies have inferred demographics automatically. Despite many studies presenting proof of concept inference of race and ethnicity, training of practical systems remains elusive since there are few annotated datasets. Existing datasets are small, inaccurate, or fail to cover the four most common racial and ethnic groups in the United States. We present a method to identify self-reports of race and ethnicity from Twitter profile descriptions. Despite errors inherent in automated supervision, we produce models with good performance when measured on gold standard self-report survey data. The result is a reproducible method for creating large-scale training resources for race and ethnicity.
2020
-
2019
-
On predictability of time series
Paiheng Xu, Likang Yin , Zhongtao Yue , and Tao Zhou
Physica A: Statistical Mechanics and its Applications, 2019
The method to estimate the predictability of human mobility was proposed in Song et al. (2010), which is extensively followed in exploring the predictability of disparate time series. However, the ambiguous description in the original paper leads to some misunderstandings, including the inconsistent logarithm bases in the entropy estimator and the entropy-predictability-conversion equation, as well as the details in the calculation of the Lempel–Ziv estimator, which further results in remarkably overestimated predictability. This paper demonstrates the degree of overestimation by four different types of theoretically generated time series and an empirical data set, and shows the intrinsic deviation of the Lempel–Ziv estimator for highly random time series. This work provides a clear picture on this issue and thus helps researchers in correctly estimating the predictability of time series.
2018
-
A novel visibility graph transformation of time series into weighted networks
Paiheng Xu, Rong Zhang , and Yong Deng
Chaos, Solitons & Fractals, 2018
Analyzing time series from the perspective of complex network has interested many scientists. In this paper, based on visibility graph theory a novel method of constructing weighted complex network from time series is proposed. The first step is to determine the weights of vertices in time series, which linearly combines the weights generated by induced ordered averaging aggregation operator (IOWA) and visibility graph aggregation operator (VGA). Then, two strategies, averaging strategy and gravity strategy, are proposed to construct weighted network. To testify the validity of proposed method, an artificial case is adopted, in which link prediction is used to evaluate the performance of the weighted network. It is shown that the weighted network constructed by proposed method greatly outperforms the unweighted network obtained by traditional visibility graph theory.