About Me
I am a 4th-year PhD student researcher focusing on Natural Language Processing and Artificial Intelligence, having the honor to be mentored by Prof. Chris Callison-Burch at the University of Pennsylvania. I was an undergraduate at the University of Michigan in 2018, previously mentored by Prof. Rada Mihalcea and Prof. Dragomir Radev.
I'm actively looking for a faculty or post-doc job starting fall 2024, inside and outside the US. zharry@seas.upenn.edu Teaching EECS 595: Natural Language Processing (Fall 2018) and EECS 280: Programming and Introductory Data Structures (Winter, Fall 2016) Work Experience Education Service
CV
Teaching Assistant
2016 - 2018
University of Pennsylvania
Ph.D.; GPA: 3.92/4.00; In progress
University of Michigan
B.S.E.; Class of 2018; GPA: 3.82/4.00
Shenzhen Middle School
High School Diploma; Class of 2015; GPA: 4.23/4.30
Program Chair of MASC-SLL 2023
Reviewer of ACL 2023
Program Chair of DaSH Workshop @ EMNLP 2022
Reviewer of COLING 2022
Reviewer of ARR since October 2021
Reviewer of LREC 2022
Program Chair of MASC-SLL 2021
Session Chair of AACL-IJCNLP 2020
Co-organizer of CLUNCH in 2020, Penn's NLP seminar series
Reviewer of COLING 2020
Reviewer of Computer Speech and Language 2018
Human language greatly emphasizes events and procedures (sequences of events), describing what happens in the world, and so reasoning about them is crucial to artificial intelligence. My work focuses on structured event reasoning, where I develop systems to automatically extract common, schematic information from texts describing events. Moreover, I inject this structured information to modern language models to advance state-of-the-art on many downstream tasks, such as question answering, dialog, script generation, symbolic planning, etc.
[14] Reasoning about Procedures with Natural Language Processing: A Tutorial; Li Zhang; in arXiv.Paper BibTeX
@misc{https://doi.org/10.48550/arxiv.2205.07455, doi = {10.48550/ARXIV.2205.07455}, url = {https://arxiv.org/abs/2205.07455}, author = {Zhang, Li}, keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences}, title = {Reasoning about Procedures with Natural Language Processing: A Tutorial}, publisher = {arXiv}, year = {2022}, copyright = {Creative Commons Attribution Non Commercial Share Alike 4.0 International} }
1. Reasoning about entities and events in procedures.
[23] OpenPI2.0: An Improved Dataset for Entity Tracking in Texts; Li Zhang, Hainiu Xu, Abhinav Kommula, Niket Tandon and Chris Callison-Burch; in preprint.Paper BibTeX Repo
@misc{zhang2023openpi20, title={OpenPI2.0: An Improved Dataset for Entity Tracking in Texts}, author={Li Zhang and Hainiu Xu and Abhinav Kommula and Niket Tandon and Chris Callison-Burch}, year={2023}, eprint={2305.14603}, archivePrefix={arXiv}, primaryClass={cs.CL} }
[19] Causal Reasoning of Entities and Events in Procedural Texts; Li Zhang*Equal contribution, Hainiu Xu*Equal contribution, Yue Yang, Shuyan Zhou, Weiqiu You, Manni Arora and Chris Callison-Burch; in Findings of EACL 2023.Paper BibTeX Repo
@inproceedings{zhang-etal-2023-causal, title = "Causal Reasoning of Entities and Events in Procedural Texts", author = "Zhang, Li and Xu, Hainiu and Yang, Yue and Zhou, Shuyan and You, Weiqiu and Arora, Manni and Callison-burch, Chris", booktitle = "Findings of the Association for Computational Linguistics: EACL 2023", month = may, year = "2023", address = "Dubrovnik, Croatia", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.findings-eacl.31", pages = "415--431", abstract = "Entities and events are crucial to natural language reasoning and common in procedural texts. Existing work has focused either exclusively on entity state tracking (e.g., whether a pan is hot) or on event reasoning (e.g., whether one would burn themselves by touching the pan), while these two tasks are often causally related. We propose CREPE, the first benchmark on causal reasoning of event plausibility and entity states. We show that most language models, including GPT-3, perform close to chance at .35 F1, lagging far behind human at .87 F1. We boost model performance to .59 F1 by creatively representing events as programming languages while prompting language models pretrained on code. By injecting the causal relations between entities and events as intermediate reasoning steps in our representation, we further boost the performance to .67 F1. Our findings indicate not only the challenge that CREPE brings for language models, but also the efficacy of code-like prompting combined with chain-of-thought prompting for multihop event reasoning.", }
[17] Unsupervised Entity Linking with Guided Summarization and Multiple Choice Selection; Young Min Cho, Li Zhang and Chris Callison-Burch; in EMNLP 2022.Paper BibTeX Repo
2. Reasoning about the relations among events like goals and steps in procedures.
[6] Reasoning about Goals, Steps, and Temporal Ordering with WikiHow; Li Zhang*Equal contribution, Qing Lyu*Equal contribution and Chris Callison-Burch; in EMNLP 2020.Paper BibTeX Repo
@inproceedings{zhang-etal-2020-reasoning, title = "Reasoning about Goals, Steps, and Temporal Ordering with {W}iki{H}ow", author = "Zhang, Li and Lyu, Qing and Callison-Burch, Chris", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.emnlp-main.374", pages = "4630--4639", }
[15] Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models; ... Li Zhang*Equal contribution, Qing Lyu*Equal contribution and Chris Callison-Burch; in TMLR.Paper
[9] Visual Goal-Step Inference using wikiHow; Yue Yang, Artemis Panagopoulou, Qing Lyu, Li Zhang, Mark Yatskar and Chris Callison-Burch; In EMNLP 2021.Paper BibTeX
@inproceedings{yang-etal-2021-visual, title = "Visual Goal-Step Inference using wiki{H}ow", author = "Yang, Yue and Panagopoulou, Artemis and Lyu, Qing and Zhang, Li and Yatskar, Mark and Callison-Burch, Chris", booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing", month = nov, year = "2021", address = "Online and Punta Cana, Dominican Republic", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.emnlp-main.165", pages = "2167--2179", abstract = "Understanding what sequence of steps are needed to complete a goal can help artificial intelligence systems reason about human activities. Past work in NLP has examined the task of goal-step inference for text. We introduce the visual analogue. We propose the Visual Goal-Step Inference (VGSI) task, where a model is given a textual goal and must choose which of four images represents a plausible step towards that goal. With a new dataset harvested from wikiHow consisting of 772,277 images representing human actions, we show that our task is challenging for state-of-the-art multimodal models. Moreover, the multimodal representation learned from our data can be effectively transferred to other datasets like HowTo100m, increasing the VGSI accuracy by 15 - 20{\%}. Our task will facilitate multimodal reasoning about procedural events.", }
[10] Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data; Shuyan Zhou*Equal contribution, Li Zhang*Equal contribution, Yue Yang, Qing Lyu, Pengcheng Yin, Chris Callison-Burch and Graham Neubig; in ACL 2022.Paper BibTeX Demo Repo
@inproceedings{zhou-etal-2022-show, title = "Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data", author = "Zhou, Shuyan and Zhang, Li and Yang, Yue and Lyu, Qing and Yin, Pengcheng and Callison-Burch, Chris and Neubig, Graham", booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = may, year = "2022", address = "Dublin, Ireland", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.acl-long.214", pages = "2998--3012", abstract = "Procedures are inherently hierarchical. To {``}make videos{''}, one may need to {``}purchase a camera{''}, which in turn may require one to {``}set a budget{''}. While such hierarchical knowledge is critical for reasoning about complex procedures, most existing work has treated procedures as shallow structures without modeling the parent-child relation. In this work, we attempt to construct an open-domain hierarchical knowledge-base (KB) of procedures based on wikiHow, a website containing more than 110k instructional articles, each documenting the steps to carry out a complex procedure. To this end, we develop a simple and efficient method that links steps (e.g., {``}purchase a camera{''}) in an article to other articles with similar goals (e.g., {``}how to choose a camera{''}), recursively constructing the KB. Our method significantly outperforms several strong baselines according to automatic evaluation, human judgment, and application to downstream tasks such as instructional video retrieval.", }
[8] Goal-Oriented Script Construction; Qing Lyu*Equal contribution, Li Zhang*Equal contribution and Chris Callison-Burch; in INLG 2021.Paper BibTeX Repo
@inproceedings{lyu-etal-2021-goal, title = "Goal-Oriented Script Construction", author = "Lyu, Qing and Zhang, Li and Callison-Burch, Chris", booktitle = "Proceedings of the 14th International Conference on Natural Language Generation", month = aug, year = "2021", address = "Aberdeen, Scotland, UK", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.inlg-1.19", pages = "184--200", abstract = "The knowledge of scripts, common chains of events in stereotypical scenarios, is a valuable asset for task-oriented natural language understanding systems. We propose the Goal-Oriented Script Construction task, where a model produces a sequence of steps to accomplish a given goal. We pilot our task on the first multilingual script learning dataset supporting 18 languages collected from wikiHow, a website containing half a million how-to articles. For baselines, we consider both a generation-based approach using a language model and a retrieval-based approach by first retrieving the relevant steps from a large candidate pool and then ordering them. We show that our task is practical, feasible but challenging for state-of-the-art Transformer models, and that our methods can be readily deployed for various other datasets and domains with decent zero-shot performance.", }
[7] Intent Detection with WikiHow; Li Zhang, Qing Lyu, Chris Callison-Burch; in AACL-IJCNLP 2020.Paper BibTeX Repo
@inproceedings{zhang-etal-2020-intent, title = "Intent Detection with {W}iki{H}ow", author = "Zhang, Li and Lyu, Qing and Callison-Burch, Chris", booktitle = "Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing", month = dec, year = "2020", address = "Suzhou, China", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.aacl-main.35", pages = "328--333", abstract = "Modern task-oriented dialog systems need to reliably understand users{'} intents. Intent detection is even more challenging when moving to new domains or new languages, since there is little annotated data. To address this challenge, we present a suite of pretrained intent detection models which can predict a broad range of intended goals from many actions because they are trained on wikiHow, a comprehensive instructional website. Our models achieve state-of-the-art results on the Snips dataset, the Schema-Guided Dialogue dataset, and all 3 languages of the Facebook multilingual dialog datasets. Our models also demonstrate strong zero- and few-shot performance, reaching over 75{\%} accuracy using only 100 training examples in all datasets.", }
[13] QuakerBot: A Household Dialog System Powered by Large Language Models; Artemis Panagopoulou, Manni Arora, Li Zhang, Dimitri Cugini, Weiqiu You, Yue Yang, Liyang Zhou, Yuxuan Wang, Zhaoyi Hou, Alyssa Hwang, Lara Martin, Sherry Shi, Chris Callison-Burch and Mark Yatskar; in Alexa Prize Proceedings 2022.Paper BibTeX
@Inproceedings{Pennsylvania2022, author = {Panagopoulou, Artemis and Arora, Manni and Zhang, Li and Cugini, Dimitri and You, Weiqiu and Yang, Yue and Zhou, Liyang and Wang, Yuxuan and Hou, Zhaoyi and Hwang, Alyssa and Martin, Lara and Shi, Sherry and Callison-Burch, Chris and Yatskar, Mark}, title = {QuakerBot: A household dialog system powered by large language models}, year = {2022}, url = {https://www.amazon.science/alexa-prize/proceedings/quakerbot-a-household-dialog-system-powered-by-large-language-models}, booktitle = {Alexa Prize TaskBot Challenge Proceedings}, }
[21] Human-in-the-Loop Schema Induction; Tianyi Zhang, Isaac Tham, Zhaoyi Hou, Jiaxuan Ren, Liyang Zhou, Hainiu Xu, Li Zhang, Lara J. Martin, Rotem Dror, Sha Li, Heng Ji, Martha Palmer, Susan Brown, Reece Suchocki, Chris Callison-Burch; in ACL 2023 Demos.Paper BibTeX Demo
@misc{https://doi.org/10.48550/arxiv.2302.13048, doi = {10.48550/ARXIV.2302.13048}, url = {https://arxiv.org/abs/2302.13048}, author = {Zhang, Tianyi and Tham, Isaac and Hou, Zhaoyi and Ren, Jiaxuan and Zhou, Liyang and Xu, Hainiu and Zhang, Li and Martin, Lara J. and Dror, Rotem and Li, Sha and Ji, Heng and Palmer, Martha and Brown, Susan and Suchocki, Reece and Callison-Burch, Chris}, keywords = {Human-Computer Interaction (cs.HC), Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences}, title = {Human-in-the-Loop Schema Induction}, publisher = {arXiv}, year = {2023}, copyright = {Creative Commons Attribution 4.0 International} }
Recenlt language models pre-trained on a mixture of text and code show strong reasoning ability. Many previous work has found that creatively prompting them with a code-like representation of textual reasoning tasks brings about improved performance. We port this method to a dozen general NLP tasks and find no consistent performance gain.
[22] Exploring the Curious Case of Code Prompts; Li Zhang*Equal contribution, Liam Dugan*Equal contribution, Hainiu Xu*Equal contribution and Chris Callison-Burch; in ACL 2023 1st Workshop on Natural Language Reasoning and Structured Explanations.Paper BibTeX Repo
@misc{zhang2023exploring, title={Exploring the Curious Case of Code Prompts}, author={Li Zhang and Liam Dugan and Hainiu Xu and Chris Callison-Burch}, year={2023}, eprint={2304.13250}, archivePrefix={arXiv}, primaryClass={cs.CL} }
The chain-of-though prompting mechanism of language models has been shown to enable multi-step machine reasoning, whereas the reasoning process is not necessarily faithful. For example, a model could arrive at the correct answer by fluke while completely messing up the thought process, which is simply generated alongwith the final answer without any logical constraints therein. We propose a faithful CoT paradigm that breaks down reasoning into two steps: translating the problem into a structured representation, and faithfully solving it by strictly following a set of executions. Our method not only greatly improves model interpretability, but outperforms state-of-the-art baselines in a suite of reasoning tasks.
[20] Faithful Chain of Thought Reasoning; Qing Lyu*Equal contribution, Shreya Havaldar*Equal contribution, Adam Stein*Equal contribution, Li Zhang, Delip Rao, Eric Wong, Marianna Apidianaki and Chris Callison-Burch; in preprint.Paper BibTeX Repo
@misc{https://doi.org/10.48550/arxiv.2301.13379, doi = {10.48550/ARXIV.2301.13379}, url = {https://arxiv.org/abs/2301.13379}, author = {Lyu, Qing and Havaldar, Shreya and Stein, Adam and Zhang, Li and Rao, Delip and Wong, Eric and Apidianaki, Marianna and Callison-Burch, Chris}, keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences}, title = {Faithful Chain-of-Thought Reasoning}, publisher = {arXiv}, year = {2023}, copyright = {Creative Commons Attribution Non Commercial No Derivatives 4.0 International} }
Semantic role labeling answers the question of "who did what to whom, when and how", extracting important information about a predicate. While previous work has treated the semantic role labels as symbolic, we explicitly use their definitions and advance state-of-the-art with some limitations.
[11] Label Definitions Improve Semantic Role Labeling; Li Zhang, Ishan Jindal, Yunyao Li; in NAACL 2022.Paper BibTeX Repo
@inproceedings{zhang-etal-2022-label-definitions, title = "Label Definitions Improve Semantic Role Labeling", author = "Zhang, Li and Jindal, Ishan and Li, Yunyao", booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies", month = jul, year = "2022", address = "Seattle, United States", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.naacl-main.411", pages = "5613--5620", abstract = "Argument classification is at the core of Semantic Role Labeling. Given a sentence and the predicate, a semantic role label is assigned to each argument of the predicate. While semantic roles come with meaningful definitions, existing work has treated them as symbolic. Learning symbolic labels usually requires ample training data, which is frequently unavailable due to the cost of annotation. We instead propose to retrieve and leverage the definitions of these labels from the annotation guidelines. For example, the verb predicate {``}work{''} has arguments defined as {``}worker{''}, {``}job{''}, {``}employer{''}, etc. Our model achieves state-of-the-art performance on the CoNLL09 dataset injected with label definitions given the predicate senses. The performance improvement is even more pronounced in low-resource settings when training data is scarce.", }
Do large language models know that a "favorite new movie" is not necessarily a "new favorite movie"?
[12] Is "my favorite new movie" my favorite movie? Probing the Understanding of Recursive Noun Phrases; Qing Lyu, Hua Zheng, Daoxin Li, Li Zhang, Marianna Apidianaki and Chris Callison-Burch; in NAACL 2022.Paper BibTeX Repo
@inproceedings{lyu-etal-2022-favorite, title = "Is {``}My Favorite New Movie{''} My Favorite Movie? Probing the Understanding of Recursive Noun Phrases", author = "Lyu, Qing and Hua, Zheng and Li, Daoxin and Zhang, Li and Apidianaki, Marianna and Callison-Burch, Chris", booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies", month = jul, year = "2022", address = "Seattle, United States", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.naacl-main.388", pages = "5286--5302", abstract = "Recursive noun phrases (NPs) have interesting semantic properties. For example, {``}my favorite new movie{''} is not necessarily my favorite movie, whereas {``}my new favorite movie{''} is. This is common sense to humans, yet it is unknown whether language models have such knowledge. We introduce the Recursive Noun Phrase Challenge (RNPC), a dataset of three textual inference tasks involving textual entailment and event plausibility comparison, precisely targeting the understanding of recursive NPs. When evaluated on RNPC, state-of-the-art Transformer models only perform around chance. Still, we show that such knowledge is learnable with appropriate data. We further probe the models for relevant linguistic features that can be learned from our tasks, including modifier semantic category and modifier scope. Finally, models trained on RNPC achieve strong zero-shot performance on an extrinsic Harm Detection evaluation task, showing the usefulness of the understanding of recursive NPs in downstream applications.", }
Split and Rephrase is a text simplification task to rewrite a complex sentence into several simpler ones. We show that the existing benchmark is too simplistic, developing a rule-based model using no training data which performs on par with the current state-of-the-art neural model. We then propose two new crowdsourced benchmarks with improved quality. We also provide a study on the flaws of BLEU score, and the cost-efficiency of using crowd workers to evaluate models.
[5] Small but Mighty: New Benchmarks for Split and Rephrase; Li Zhang, Huaiyu Zhu, Siddhartha Brahma and Yunyao Li; in EMNLP 2020; a part of the GEM Benchmark.Paper BibTeX Repo
@inproceedings{zhang-etal-2020-small, title = "Small but Mighty: New Benchmarks for Split and Rephrase", author = "Zhang, Li and Zhu, Huaiyu and Brahma, Siddhartha and Li, Yunyao", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.emnlp-main.91", pages = "1198--1205", }
[16] GEMv2: Multilingual NLG Benchmarking in a Single Line of Code; ... Li Zhang, Huaiyu Zhu, Siddhartha Brahma, Yunyao Li, ...; in EMNLP 2022.Paper
Recent advancement on neural sentence embeddings show highly competitive performance on semantic similarity tasks. However, the embeddings don't usually just work off-the-shelf, as we show that the transfer learning methodology is crucial to performance. We propose a fine-tuning approach and a multi-label approach which outperforms most alternative transfer learning approaches on semantic similarity tasks, achieving state-of-the-art performance on multiple datasets.
[4] Multi-Label Transfer Learning for Multi-Relational Semantic Similarity; Li Zhang, Steven R. Wilson and Rada Mihalcea; In *SEM 2019. Paper BibTeX Slides
@inproceedings{zhang-etal-2019-multi, title = "Multi-Label Transfer Learning for Multi-Relational Semantic Similarity", author = "Zhang, Li and Wilson, Steven and Mihalcea, Rada", booktitle = "Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*{SEM} 2019)", month = jun, year = "2019", address = "Minneapolis, Minnesota", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/S19-1005", pages = "44--50", abstract = "Multi-relational semantic similarity datasets define the semantic relations between two short texts in multiple ways, e.g., similarity, relatedness, and so on. Yet, all the systems to date designed to capture such relations target one relation at a time. We propose a multi-label transfer learning approach based on LSTM to make predictions for several relations simultaneously and aggregate the losses to update the parameters. This multi-label regression approach jointly learns the information provided by the multiple relations, rather than treating them as separate tasks. Not only does this approach outperform the single-task approach and the traditional multi-task learning approach, but it also achieves state-of-the-art performance on all but one relation of the Human Activity Phrase dataset.", }
[3] Direct Network Transfer: Transfer Learning of Sentence Embeddings for Semantic Similarity; Li Zhang, Steven R. Wilson and Rada Mihalcea; in arXiv pre-print; presented at IC2S2 2018.Paper BibTeX Poster
@misc{zhang2018direct, title={Direct Network Transfer: Transfer Learning of Sentence Embeddings for Semantic Similarity}, author={Li Zhang and Steven R. Wilson and Rada Mihalcea}, year={2018}, eprint={1804.07835}, archivePrefix={arXiv}, primaryClass={cs.CL} }
This work is a part of the UM-IBM Sapphire project. The goal is to build a dialog system able to answer questions about university course information. While tackling the task of translating natural language to SQL, we identified flaws in the current text-to-SQL evaluation scheme and proposed alternatives. I contributed to building the a text-to-SQL dataset and implementing named entitiy recognition as a preprocessing step.
[1] Improving Text-to-SQL Evaluation Methodology; Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik Ramanathan Dhanalakshmi Ramanathan, Sesh Sadasivam, Rui Zhang and Dragomir Radev; in ACL 2018.Paper BibTeX Repo Poster
@InProceedings{acl18sql, author = {Catherine Finegan-Dollak\* and Jonathan K. Kummerfeld\* and Li Zhang and Karthik Ramanathan and Sesh Sadasivam and Rui Zhang and Dragomir Radev}, title = {Improving Text-to-SQL Evaluation Methodology}, booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, shortvenue = {ACL}, month = {July}, year = {2018}, address = {Melbourne, Victoria, Australia}, pages = {351--360}, abstract = {To be informative, an evaluation must measure how well systems generalize to realistic unseen data. We identify limitations of and propose improvements to current evaluations of text-to-SQL systems. First, we compare human-generated and automatically generated questions, characterizing properties of queries necessary for real-world applications. To facilitate evaluation on multiple datasets, we release standardized and improved versions of seven existing datasets and one new text-to-SQL dataset. Second, we show that the current division of data into training and test sets measures robustness to variations in the way questions are asked, but only partially tests how well systems generalize to new queries; therefore, we propose a complementary dataset split for evaluation of future work. Finally, we demonstrate how the common practice of anonymizing variables during evaluation removes an important challenge of the task. Our observations highlight key difficulties, and our methodology enables effective measurement of future development.}, url = {http://aclweb.org/anthology/P18-1033}, software = {https://github.com/jkkummerfeld/text2sql-data}, data = {https://github.com/jkkummerfeld/text2sql-data}, }
This work is a part of the DARPA AIDA project. From the texts, audios and videos recounting the Russia-Ukraine conflict in 2014, the goal is to extract knowledge elements and generate hypotheses about real-life events. I used named entity recognition, keyword extraction and word embeddings to extract textual entities from the data and assign them with categories from the given ontology.
[2] Entity and Event Extraction from Scratch Using Minimal Training Data; Laura Burdick, Steven R. Wilson, Oana Ignat, Charles F. Welch, Li Zhang, Mingzhe Wang, Jia Deng and Rada Mihalcea; in TAC 2018.Paper BibTeX Poster
@article{Burdick2018EntityAE, title={Entity and Event Extraction from Scratch Using Minimal Training Data}, author={Laura Burdick and Steven R. Wilson and Oana Ignat and Charles F Welch and Li Zhang and Mingzhe Wang and Jia Deng and Rada Mihalcea}, journal={Theory and Applications of Categories}, year={2018} }
In each volume of the New Yorker magazine, there is a comic section where thousands of readers submit funny captions. The goal is to automatically divide them into clusters based on their theme of humor (what they are joking about) using unsupervised learning. Work had been done years ago but the codes were scattered and underdocumented. I as a freshman was in charge of this project, to bring the existing system up to date and to make optimization.
AAN encompases our corpus of resources on NLP and related fields and the research projects which build upon this corpus. We have collected around 6,500 surveys, tutorials and other resources and created a search engine which allows users to easily browse these resources. I helped build and maintain this power anthology with information regarding numerous papers included in top NLP venues. It features paper citation, author citation, and author collaboration, etc.
As a simultaneous drummer and AI researcher, I also study automatic drum generation using NLP techniques as a side project.
[18] Language Models are Drummers: Drum Composition with Natural Language Pre-Training; Li Zhang and Chris Callison-Burch; in AAAI 2023 Workshop on Creative AI Across Modalities.Paper BibTeX
@InProceedings{gpt3drum, author = {Li Zhang and Chris Callison-Burch}, title = {Language Models are Drummers: Drum Composition with Natural Language Pre-Training}, venue = {AAAI 2023 1st workshop on Creative AI across Modalities}, month = {Feburary}, year = {2023}, address = {Washington, D.C., USA}, abstract = {Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.}, url = {https://arxiv.org/abs/2301.01162}, software = {https://github.com/zharry29/drums-with-llm}, }