Li "Harry" Zhang

Li "Harry" Zhang 张力

About Me

I am an assistant professor at Drexel University focusing on Natural Language Processing and Artificial Intelligence. I'm interested in planning and reasoning using Large Language Models. I earned my PhD (thesis) from the University of Pennsylvania, having the honor to be mentored by Prof. Chris Callison-Burch, with a thesis committee chaired by Prof. Dan Roth. I earned my BS from the University of Michigan in 2018, mentored by Prof. Rada Mihalcea and Prof. Dragomir Radev.

I am hiring either paid and unpaid, either remote or in-person research assistants or interns with the goal of conversion to a PhD student. Those interested should fill out this form. I cannot respond to emails on this matter.

CV

Harry.Zhang@drexel.edu

Affiliations

Drexel University

Assistant Professor;
Dec 2024 to Present

University of Pennsylvania

Ph.D.; Aug 2019 to Aug 2024

Allen Institute for Artifical Intelligence

Research Intern;
April 2023 to Dec 2023

IBM Research

Research Intern; May 2021 to Aug 2021,
April 2019 to June 2019

University of Michigan

B.S.E.; Aug 2015 to Dec 2018

Goldman Sachs

Summer Analyst;
May 2017 to Aug 2017

Shenzhen Middle School

High School Diploma;
Sept 2012 to Jun 2015

Mentorship and Teaching

HazLab at Drexel

Cassie Huang (PhD student)

Krystal Gong (intern)

Prabhu Prakash Kagitha (MS student)

Rikhil Amonkar (UG student)

Past Mentees at Penn

Tianyi Zhang

Hainiu Xu, King's College London

Zhaoyi Hou, University of Pittsburg

Young-Min Cho, University of Pennsylvania

Teaching

Instructor: CS T780-001: Applied NLP (Spring 2025 at Drexel)

TA: CIS 530: Computational Linguistics (Winter, Fall 2020 at Penn)

TA: EECS 595: Natural Language Processing (Fall 2018 at Michigan)

TA: EECS 280: Programming and Introductory Data Structures (Winter, Fall 2016 at Michigan)

Service

I have reviewed more than 50 papers of and chaired for many NLP conferences and workshops.

Area Chair of ARR Feb 2025 / ACL 2025, ARR Dec 2024, COLING 2025, ARR Aug 2024, ARR Jun 2024 / EMNLP 2024, ARR Feb 2024 / ACL 2024

Session Chair of ACL 2024, AACL-IJCNLP 2020

Program Chair of MASC-SLL 2023, MASC-SLL 2021

Reviewer of LREC-COLING 2024, EMNLP 2023, ACL 2023, ARR Mar 2022, DaSH Workshop @ EMNLP 2022, COLING 2022, LREC 2022, ARR Nov 2021, COLING 2020, Computer Speech and Language 2018

Research

Primary Work: Executable and Trustworthy Planning with LLMs

While large language models (LLM) can provide decent instructions, they are far from able to come up an executable and trustworthy plan for a particular user or agent, grounding to their specific situation and needs. To address this, I advocate for the neurosymbolic methodology of using LLM as a code generator to create a formal representation of the planning environment. In conjunction with tools in classical AI planning, a plan can be found deterministically and faithfully.

My primary efforts lie in using LLMs to generate formal language, such as PDDL that describes the planning environment.

[30] On the Limit of Language Models as Planning Formalizers; Cassie Huang^Mentored student and Li Zhang; in arxiv.Paper BibTeX Repo

@misc{huang2024limitlanguagemodelsplanning,
  title={On the Limit of Language Models as Planning Formalizers}, 
  author={Cassie Huang and Li Zhang},
  year={2024},
  eprint={2412.09879},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2412.09879},

[29] PDDLEGO: Iterative Planning in Textual Environments; Li Zhang, Peter Jansen, Peter Clark, Chris Callison-Burch and Niket Tandon; in *SEM 2024.Paper BibTeX Repo

@inproceedings{zhang-etal-2024-pddlego,
  title = "{PDDLEGO}: Iterative Planning in Textual Environments",
  author = "Zhang, Li  and
    Jansen, Peter  and
    Zhang, Tianyi  and
    Clark, Peter  and
    Callison-Burch, Chris  and
    Tandon, Niket",
  editor = "Bollegala, Danushka  and
    Shwartz, Vered",
  booktitle = "Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024)",
  month = jun,
  year = "2024",
  address = "Mexico City, Mexico",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2024.starsem-1.17",
  pages = "212--221",
  abstract = "Planning in textual environments have been shown to be a long-standing challenge even for current models. A recent, promising line of work uses LLMs to generate a formal representation of the environment that can be solved by a symbolic planner. However, existing methods rely on a fully-observed environment where all entity states are initially known, so a one-off representation can be constructed, leading to a complete plan. In contrast, we tackle partially-observed environments where there is initially no sufficient information to plan for the end-goal. We propose PDDLEGO that iteratively construct a planning representation that can lead to a partial plan for a given sub-goal. By accomplishing the sub-goal, more information is acquired to augment the representation, eventually achieving the end-goal. We show that plans produced by few-shot PDDLEGO are 43{\%} more efficient than generating plans end-to-end on the Coin Collector simulation, with strong performance (98{\%}) on the more complex Cooking World simulation where end-to-end LLMs fail to generate coherent plans (4{\%}).",
}

[28] PROC2PDDL: Open-Domain Planning Representations from Texts; Tianyi Zhang*Equal contribution^Mentored student, Li Zhang*Equal contribution, Zhaoyi Hou^Mentored student, Ziyu Wang^Mentored student, Yuling Gu, Peter Clark, Chris Callison-Burch and Niket Tandon; in ACL 2024 2st Workshop on Natural Language Reasoning and Structured Explanations.Paper BibTeX Repo

@inproceedings{zhang-etal-2024-proc2pddl,
  title = "PROC2PDDL: Open-Domain Planning Representations from Texts",
  author = "Zhang, Tianyi and Zhang, Li  and Hou, Zhaoyi and Wang, Ziyu and Gu, Yuling and Clark, Peter and Callison-Burch, Chris and Tandon, Niket",
  booktitle = "Proceedings of the 2st Workshop on Natural Language Reasoning and Structured Explanations (NLRSE)",
  month = aug,
  year = "2024",
  address = "Bangkok, Thailand",
  publisher = "Association for Computational Linguistics",
}

[20] Faithful Chain of Thought Reasoning; Qing Lyu*Equal contribution, Shreya Havaldar*Equal contribution, Adam Stein*Equal contribution, Li Zhang, Delip Rao, Eric Wong, Marianna Apidianaki and Chris Callison-Burch; in IJCNLP-AACL 2023. Won Area Chair Award.Paper BibTeX Repo

@inproceedings{lyu-etal-2023-faithful,
  title = "Faithful Chain-of-Thought Reasoning",
  author = "Lyu, Qing  and
  Havaldar, Shreya  and
  Stein, Adam  and
  Zhang, Li  and
  Rao, Delip  and
  Wong, Eric  and
  Apidianaki, Marianna  and
  Callison-Burch, Chris",
  editor = "Park, Jong C.  and
  Arase, Yuki  and
  Hu, Baotian  and
  Lu, Wei  and
  Wijaya, Derry  and
  Purwarianti, Ayu  and
  Krisnadhi, Adila Alfa",
  booktitle = "Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)",
  month = nov,
  year = "2023",
  address = "Nusa Dua, Bali",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2023.ijcnlp-main.20",
  pages = "305--329",
}

[31] DynaCode: A Dynamic Complexity-Aware Code Benchmark for Evaluating Large Language Models in Code Generation; Wenhao Hu, Jinhao Duan, Chunchen Wei, Li Zhang, Yue Zhang, Kaidi Xu; in arxiv.Paper BibTeX

@misc{hu2025dynacodedynamiccomplexityawarecode,
  title={DynaCode: A Dynamic Complexity-Aware Code Benchmark for Evaluating Large Language Models in Code Generation}, 
  author={Wenhao Hu and Jinhao Duan and Chunchen Wei and Li Zhang and Yue Zhang and Kaidi Xu},
  year={2025},
  eprint={2503.10452},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2503.10452}, 
}

[27] Calibrating Large Language Models with Sample Consistency; Qing Lyu*Equal contribution, Kumar Shridhar*Equal contribution, Chaitanya Malaviya, Li Zhang, Yanai Elazar, Niket Tandon, Marianna Apidianaki, Mrinmaya Sachan and Chris Callison-Burch; in AAAI 2025.Paper BibTeX

  @misc{lyu2024calibrating,
    title={Calibrating Large Language Models with Sample Consistency}, 
    author={Qing Lyu and Kumar Shridhar and Chaitanya Malaviya and Li Zhang and Yanai Elazar and Niket Tandon and Marianna Apidianaki and Mrinmaya Sachan and Chris Callison-Burch},
    year={2024},
    eprint={2402.13904},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
  }

[26] Tailoring with Targeted Precision: Edit-Based Agents for Open-Domain Procedure Customization; Yash Kumar Lal, Li Zhang, Faeze Brahman, Bodhisattwa Prasad Majumder, Peter Clark, Niket Tandon; in Findings of ACL 2024.Paper BibTeX

@inproceedings{lal-etal-2024-tailoring,
  title = "Tailoring with Targeted Precision: Edit-Based Agents for Open-Domain Procedure Customization",
  author = "Lal, Yash Kumar  and
    Zhang, Li  and
    Brahman, Faeze  and
    Majumder, Bodhisattwa Prasad  and
    Clark, Peter  and
    Tandon, Niket",
  editor = "Ku, Lun-Wei  and
    Martins, Andre  and
    Srikumar, Vivek",
  booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
  month = aug,
  year = "2024",
  address = "Bangkok, Thailand",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2024.findings-acl.921/",
  doi = "10.18653/v1/2024.findings-acl.921",
  pages = "15597--15611",
  abstract = "How-to procedures, such as how to plant a garden, are now used by millions of users, but sometimes need customizing to meet a user`s specific needs, e.g., planting a garden without pesticides. Our goal is to measure and improve an LLM`s ability to perform such customization. Our approach is to test several simple multi-LLM-agent architectures for customization, as well as an end-to-end LLM, using a new evaluation set, called CustomPlans, of over 200 WikiHow procedures each with a customization need. We find that a simple architecture with two LLM agents used sequentially performs best, one that edits a generic how-to procedure and one that verifies its executability, significantly outperforming (10.5{\%} absolute) an end-to-end prompted LLM. This suggests that LLMs can be configured reasonably effectively for procedure customization. This also suggests that multi-agent editing architectures may be worth exploring further for other customization applications (e.g. coding, creative writing) in the future."
}

[25] CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization; Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Peter Jansen, Oyvind Tafjord, Niket Tandon, Li Zhang, Chris Callison-Burch, Peter Clark; in COLM 2024.Paper BibTeX Repo

@misc{majumder2023clin,
  title={CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization}, 
  author={Bodhisattwa Prasad Majumder and Bhavana Dalvi Mishra and Peter Jansen and Oyvind Tafjord and Niket Tandon and Li Zhang and Chris Callison-Burch and Peter Clark},
  year={2023},
  eprint={2310.10134},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

[24] Choice-75: A Dataset on Decision Branching in Script Learning; Zhaoyi Joey Hou^Mentored student, Li Zhang, Chris Callison-Burch; in LREC-COLING 2024.Paper BibTeX Repo

@inproceedings{hou-etal-2024-choice-75,
  title = "Choice-75: A Dataset on Decision Branching in Script Learning",
  author = "Hou, Zhaoyi  and
    Zhang, Li  and
    Callison-Burch, Chris",
  editor = "Calzolari, Nicoletta  and
    Kan, Min-Yen  and
    Hoste, Veronique  and
    Lenci, Alessandro  and
    Sakti, Sakriani  and
    Xue, Nianwen",
  booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
  month = may,
  year = "2024",
  address = "Torino, Italia",
  publisher = "ELRA and ICCL",
  url = "https://aclanthology.org/2024.lrec-main.285",
  pages = "3215--3223",
  abstract = "Script learning studies how daily events unfold. It enables machines to reason about narratives with implicit information. Previous works mainly consider a script as a linear sequence of events while ignoring the potential branches that arise due to people{'}s circumstantial choices. We hence propose Choice-75, the first benchmark that challenges intelligent systems to make decisions given descriptive scenarios, containing 75 scripts and more than 600 scenarios. We also present preliminary results with current large language models (LLM). Although they demonstrate overall decent performances, there is still notable headroom in hard scenarios.",
}

[23] OpenPI2.0: An Improved Dataset for Entity Tracking in Texts; Li Zhang, Hainiu Xu^Mentored student, Abhinav Kommula, Chris Callison-Burch and Niket Tandon; in EACL 2024.Paper BibTeX Repo

@inproceedings{zhang-etal-2024-openpi2,
  title = "{O}pen{PI}2.0: An Improved Dataset for Entity Tracking in Texts",
  author = "Zhang, Li  and
    Xu, Hainiu  and
    Kommula, Abhinav  and
    Callison-Burch, Chris  and
    Tandon, Niket",
  editor = "Graham, Yvette  and
    Purver, Matthew",
  booktitle = "Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)",
  month = mar,
  year = "2024",
  address = "St. Julian{'}s, Malta",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2024.eacl-long.10",
  pages = "166--178",
  abstract = "Much texts describe a changing world (e.g., procedures, stories, newswires), and understanding them requires tracking how entities change. An earlier dataset, OpenPI, provided crowdsourced annotations of entity state changes in text. However, a major limitation was that those annotations were free-form and did not identify salient changes, hampering model evaluation. To overcome these limitations, we present an improved dataset, OpenPI2.0, where entities and attributes are fully canonicalized and additional entity salience annotations are added. On our fairer evaluation setting, we find that current state-of-the-art language models are far from competent. We also show that using state changes of salient entities as a chain-of-thought prompt, downstream performance is improved on tasks such as question answering and classical planning, outperforming the setting involving all related entities indiscriminately. We offer OpenPI2.0 for the continued development of models that can understand the dynamics of entities in text.",
}

[22] Exploring the Curious Case of Code Prompts; Li Zhang*Equal contribution, Liam Dugan*Equal contribution, Hainiu Xu^Mentored student*Equal contribution and Chris Callison-Burch; in ACL 2023 1st Workshop on Natural Language Reasoning and Structured Explanations.Paper BibTeX Repo

@inproceedings{zhang-etal-2023-exploring,
title = "Exploring the Curious Case of Code Prompts",
author = "Zhang, Li  and
Dugan, Liam  and
Xu, Hainiu  and
Callison-burch, Chris",
booktitle = "Proceedings of the 1st Workshop on Natural Language Reasoning and Structured Explanations (NLRSE)",
month = jun,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.nlrse-1.2",
pages = "9--17",
abstract = "Recent work has shown that prompting language models with code-like representations of natural language leads to performance improvements on structured reasoning tasks. However, such tasks comprise only a small subset of all natural language tasks. In our work, we seek to answer whether or not code-prompting is the preferred way of interacting with language models in general. We compare code and text prompts across three popular GPT models (davinci, code-davinci-002, and text-davinci-002) on a broader selection of tasks (e.g., QA, sentiment, summarization) and find that with few exceptions, code prompts do not consistently outperform text prompts. Furthermore, we show that the style of code prompt has a large effect on performance for some (but not all) tasks and that fine-tuning on text instructions leads to better relative performance of code prompts.",
}

[21] Human-in-the-Loop Schema Induction; Tianyi Zhang^Mentored student, Isaac Tham, Zhaoyi Hou^Mentored student, Jiaxuan Ren, Liyang Zhou^Mentored student, Hainiu Xu^Mentored student, Li Zhang, Lara J. Martin, Rotem Dror, Sha Li, Heng Ji, Martha Palmer, Susan Brown, Reece Suchocki, Chris Callison-Burch; in ACL 2023 Demos.Paper BibTeX Demo

@inproceedings{zhang-etal-2023-human,
  title = "Human-in-the-loop Schema Induction",
  author = "Zhang, Tianyi  and
    Tham, Isaac  and
    Hou, Zhaoyi  and
    Ren, Jiaxuan  and
    Zhou, Leon  and
    Xu, Hainiu  and
    Zhang, Li  and
    Martin, Lara  and
    Dror, Rotem  and
    Li, Sha  and
    Ji, Heng  and
    Palmer, Martha  and
    Brown, Susan Windisch  and
    Suchocki, Reece  and
    Callison-Burch, Chris",
  booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)",
  month = jul,
  year = "2023",
  address = "Toronto, Canada",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2023.acl-demo.1",
  pages = "1--10",
  abstract = "Schema induction builds a graph representation explaining how events unfold in a scenario. Existing approaches have been based on information retrieval (IR) and information extraction (IE), often with limited human curation. We demonstrate a human-in-the-loop schema induction system powered by GPT-3. We first describe the different modules of our system, including prompting to generate schematic elements, manual edit of those elements, and conversion of those into a schema graph. By qualitatively comparing our system to previous ones, we show that our system not only transfers to new domains more easily than previous approaches, but also reduces efforts of human curation thanks to our interactive interface.",
}

[19] Causal Reasoning of Entities and Events in Procedural Texts; Li Zhang*Equal contribution, Hainiu Xu*Equal contribution^Mentored student, Yue Yang, Shuyan Zhou, Weiqiu You, Manni Arora and Chris Callison-Burch; in Findings of EACL 2023.Paper BibTeX Repo

@inproceedings{zhang-etal-2023-causal,
  title = "Causal Reasoning of Entities and Events in Procedural Texts",
  author = "Zhang, Li  and
    Xu, Hainiu  and
    Yang, Yue  and
    Zhou, Shuyan  and
    You, Weiqiu  and
    Arora, Manni  and
    Callison-burch, Chris",
  booktitle = "Findings of the Association for Computational Linguistics: EACL 2023",
  month = may,
  year = "2023",
  address = "Dubrovnik, Croatia",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2023.findings-eacl.31",
  pages = "415--431",
  abstract = "Entities and events are crucial to natural language reasoning and common in procedural texts. Existing work has focused either exclusively on entity state tracking (e.g., whether a pan is hot) or on event reasoning (e.g., whether one would burn themselves by touching the pan), while these two tasks are often causally related. We propose CREPE, the first benchmark on causal reasoning of event plausibility and entity states. We show that most language models, including GPT-3, perform close to chance at .35 F1, lagging far behind human at .87 F1. We boost model performance to .59 F1 by creatively representing events as programming languages while prompting language models pretrained on code. By injecting the causal relations between entities and events as intermediate reasoning steps in our representation, we further boost the performance to .67 F1. Our findings indicate not only the challenge that CREPE brings for language models, but also the efficacy of code-like prompting combined with chain-of-thought prompting for multihop event reasoning.",
}

[17] Unsupervised Entity Linking with Guided Summarization and Multiple Choice Selection; Young Min Cho^Mentored student, Li Zhang and Chris Callison-Burch; in EMNLP 2022.Paper BibTeX Repo

@inproceedings{cho-etal-2022-unsupervised,
title = "Unsupervised Entity Linking with Guided Summarization and Multiple-Choice Selection",
author = "Cho, Young Min  and
Zhang, Li  and
Callison-Burch, Chris",
booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2022",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.emnlp-main.638",
pages = "9394--9401",
abstract = "Entity linking, the task of linking potentially ambiguous mentions in texts to corresponding knowledge-base entities, is an important component for language understanding. We address two challenge in entity linking: how to leverage wider contexts surrounding a mention, and how to deal with limited training data. We propose a fully unsupervised model called SumMC that first generates a guided summary of the contexts conditioning on the mention, and then casts the task to a multiple-choice problem where the model chooses an entity from a list of candidates. In addition to evaluating our model on existing datasets that focus on named entities, we create a new dataset that links noun phrases from WikiHow to Wikidata. We show that our SumMC model achieves state-of-the-art unsupervised performance on our new dataset and on exiting datasets.",
}

[16] GEMv2: Multilingual NLG Benchmarking in a Single Line of Code; ... Li Zhang, Huaiyu Zhu, Siddhartha Brahma, Yunyao Li, ...; in EMNLP 2022.Paper

[15] Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models; ... Li Zhang*Equal contribution, Qing Lyu*Equal contribution and Chris Callison-Burch; in TMLR.Paper

[13] QuakerBot: A Household Dialog System Powered by Large Language Models; Artemis Panagopoulou, Manni Arora^Mentored student, Li Zhang, Dimitri Cugini, Weiqiu You, Yue Yang, Liyang Zhou^Mentored student, Yuxuan Wang, Zhaoyi Hou^Mentored student, Alyssa Hwang, Lara Martin, Sherry Shi, Chris Callison-Burch and Mark Yatskar; in Alexa Prize Proceedings 2022.Paper BibTeX

@Inproceedings{Pennsylvania2022,
author = {Panagopoulou, Artemis and Arora, Manni and Zhang, Li and Cugini, Dimitri and You, Weiqiu and Yang, Yue and Zhou, Liyang and Wang, Yuxuan and Hou, Zhaoyi and Hwang, Alyssa and Martin, Lara and Shi, Sherry and Callison-Burch, Chris and Yatskar, Mark},
title = {QuakerBot: A household dialog system powered by large language models},
year = {2022},
url = {https://www.amazon.science/alexa-prize/proceedings/quakerbot-a-household-dialog-system-powered-by-large-language-models},
booktitle = {Alexa Prize TaskBot Challenge Proceedings},
}

[12] Is "my favorite new movie" my favorite movie? Probing the Understanding of Recursive Noun Phrases; Qing Lyu, Hua Zheng, Daoxin Li, Li Zhang, Marianna Apidianaki and Chris Callison-Burch; in NAACL 2022.Paper BibTeX Repo

@inproceedings{lyu-etal-2022-favorite,
    title = "Is {``}My Favorite New Movie{''} My Favorite Movie? Probing the Understanding of Recursive Noun Phrases",
    author = "Lyu, Qing  and
      Hua, Zheng  and
      Li, Daoxin  and
      Zhang, Li  and
      Apidianaki, Marianna  and
      Callison-Burch, Chris",
    booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jul,
    year = "2022",
    address = "Seattle, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.naacl-main.388",
    pages = "5286--5302",
    abstract = "Recursive noun phrases (NPs) have interesting semantic properties. For example, {``}my favorite new movie{''} is not necessarily my favorite movie, whereas {``}my new favorite movie{''} is. This is common sense to humans, yet it is unknown whether language models have such knowledge. We introduce the Recursive Noun Phrase Challenge (RNPC), a dataset of three textual inference tasks involving textual entailment and event plausibility comparison, precisely targeting the understanding of recursive NPs. When evaluated on RNPC, state-of-the-art Transformer models only perform around chance. Still, we show that such knowledge is learnable with appropriate data. We further probe the models for relevant linguistic features that can be learned from our tasks, including modifier semantic category and modifier scope. Finally, models trained on RNPC achieve strong zero-shot performance on an extrinsic Harm Detection evaluation task, showing the usefulness of the understanding of recursive NPs in downstream applications.",
}

[11] Label Definitions Improve Semantic Role Labeling; Li Zhang, Ishan Jindal, Yunyao Li; in NAACL 2022.Paper BibTeX Repo

@inproceedings{zhang-etal-2022-label-definitions,
    title = "Label Definitions Improve Semantic Role Labeling",
    author = "Zhang, Li  and
      Jindal, Ishan  and
      Li, Yunyao",
    booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jul,
    year = "2022",
    address = "Seattle, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.naacl-main.411",
    pages = "5613--5620",
    abstract = "Argument classification is at the core of Semantic Role Labeling. Given a sentence and the predicate, a semantic role label is assigned to each argument of the predicate. While semantic roles come with meaningful definitions, existing work has treated them as symbolic. Learning symbolic labels usually requires ample training data, which is frequently unavailable due to the cost of annotation. We instead propose to retrieve and leverage the definitions of these labels from the annotation guidelines. For example, the verb predicate {``}work{''} has arguments defined as {``}worker{''}, {``}job{''}, {``}employer{''}, etc. Our model achieves state-of-the-art performance on the CoNLL09 dataset injected with label definitions given the predicate senses. The performance improvement is even more pronounced in low-resource settings when training data is scarce.",
}

[10] Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data; Shuyan Zhou*Equal contribution, Li Zhang*Equal contribution, Yue Yang, Qing Lyu, Pengcheng Yin, Chris Callison-Burch and Graham Neubig; in ACL 2022.Paper BibTeX Demo Repo

@inproceedings{zhou-etal-2022-show,
title = "Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data",
author = "Zhou, Shuyan  and
  Zhang, Li  and
  Yang, Yue  and
  Lyu, Qing  and
  Yin, Pengcheng  and
  Callison-Burch, Chris  and
  Neubig, Graham",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-long.214",
pages = "2998--3012",
abstract = "Procedures are inherently hierarchical. To {``}make videos{''}, one may need to {``}purchase a camera{''}, which in turn may require one to {``}set a budget{''}. While such hierarchical knowledge is critical for reasoning about complex procedures, most existing work has treated procedures as shallow structures without modeling the parent-child relation. In this work, we attempt to construct an open-domain hierarchical knowledge-base (KB) of procedures based on wikiHow, a website containing more than 110k instructional articles, each documenting the steps to carry out a complex procedure. To this end, we develop a simple and efficient method that links steps (e.g., {``}purchase a camera{''}) in an article to other articles with similar goals (e.g., {``}how to choose a camera{''}), recursively constructing the KB. Our method significantly outperforms several strong baselines according to automatic evaluation, human judgment, and application to downstream tasks such as instructional video retrieval.",
}

[9] Visual Goal-Step Inference using wikiHow; Yue Yang, Artemis Panagopoulou, Qing Lyu, Li Zhang, Mark Yatskar and Chris Callison-Burch; In EMNLP 2021.Paper BibTeX

@inproceedings{yang-etal-2021-visual,
title = "Visual Goal-Step Inference using wiki{H}ow",
author = "Yang, Yue  and
  Panagopoulou, Artemis  and
  Lyu, Qing  and
  Zhang, Li  and
  Yatskar, Mark  and
  Callison-Burch, Chris",
booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2021",
address = "Online and Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.emnlp-main.165",
pages = "2167--2179",
abstract = "Understanding what sequence of steps are needed to complete a goal can help artificial intelligence systems reason about human activities. Past work in NLP has examined the task of goal-step inference for text. We introduce the visual analogue. We propose the Visual Goal-Step Inference (VGSI) task, where a model is given a textual goal and must choose which of four images represents a plausible step towards that goal. With a new dataset harvested from wikiHow consisting of 772,277 images representing human actions, we show that our task is challenging for state-of-the-art multimodal models. Moreover, the multimodal representation learned from our data can be effectively transferred to other datasets like HowTo100m, increasing the VGSI accuracy by 15 - 20{\%}. Our task will facilitate multimodal reasoning about procedural events.",
}

[8] Goal-Oriented Script Construction; Qing Lyu*Equal contribution, Li Zhang*Equal contribution and Chris Callison-Burch; in INLG 2021.Paper BibTeX Repo

@inproceedings{lyu-etal-2021-goal,
title = "Goal-Oriented Script Construction",
author = "Lyu, Qing  and
Zhang, Li  and
Callison-Burch, Chris",
booktitle = "Proceedings of the 14th International Conference on Natural Language Generation",
month = aug,
year = "2021",
address = "Aberdeen, Scotland, UK",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.inlg-1.19",
pages = "184--200",
abstract = "The knowledge of scripts, common chains of events in stereotypical scenarios, is a valuable asset for task-oriented natural language understanding systems. We propose the Goal-Oriented Script Construction task, where a model produces a sequence of steps to accomplish a given goal. We pilot our task on the first multilingual script learning dataset supporting 18 languages collected from wikiHow, a website containing half a million how-to articles. For baselines, we consider both a generation-based approach using a language model and a retrieval-based approach by first retrieving the relevant steps from a large candidate pool and then ordering them. We show that our task is practical, feasible but challenging for state-of-the-art Transformer models, and that our methods can be readily deployed for various other datasets and domains with decent zero-shot performance.",
}

[7] Intent Detection with WikiHow; Li Zhang, Qing Lyu, Chris Callison-Burch; in AACL-IJCNLP 2020.Paper BibTeX Repo

@inproceedings{zhang-etal-2020-intent,
  title = "Intent Detection with {W}iki{H}ow",
  author = "Zhang, Li  and
    Lyu, Qing  and
    Callison-Burch, Chris",
  booktitle = "Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing",
  month = dec,
  year = "2020",
  address = "Suzhou, China",
  publisher = "Association for Computational Linguistics",
  url = "https://www.aclweb.org/anthology/2020.aacl-main.35",
  pages = "328--333",
  abstract = "Modern task-oriented dialog systems need to reliably understand users{'} intents. Intent detection is even more challenging when moving to new domains or new languages, since there is little annotated data. To address this challenge, we present a suite of pretrained intent detection models which can predict a broad range of intended goals from many actions because they are trained on wikiHow, a comprehensive instructional website. Our models achieve state-of-the-art results on the Snips dataset, the Schema-Guided Dialogue dataset, and all 3 languages of the Facebook multilingual dialog datasets. Our models also demonstrate strong zero- and few-shot performance, reaching over 75{\%} accuracy using only 100 training examples in all datasets.",
}

[6] Reasoning about Goals, Steps, and Temporal Ordering with WikiHow; Li Zhang*Equal contribution, Qing Lyu*Equal contribution and Chris Callison-Burch; in EMNLP 2020.Paper BibTeX Repo

@inproceedings{zhang-etal-2020-reasoning,
  title = "Reasoning about Goals, Steps, and Temporal Ordering with {W}iki{H}ow",
  author = "Zhang, Li  and
    Lyu, Qing  and
    Callison-Burch, Chris",
  booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
  month = nov,
  year = "2020",
  address = "Online",
  publisher = "Association for Computational Linguistics",
  url = "https://www.aclweb.org/anthology/2020.emnlp-main.374",
  pages = "4630--4639",
}

[5] Small but Mighty: New Benchmarks for Split and Rephrase; Li Zhang, Huaiyu Zhu, Siddhartha Brahma and Yunyao Li; in EMNLP 2020; a part of the GEM Benchmark.Paper BibTeX Repo

@inproceedings{zhang-etal-2020-small,
title = "Small but Mighty: New Benchmarks for Split and Rephrase",
author = "Zhang, Li  and
  Zhu, Huaiyu  and
  Brahma, Siddhartha  and
  Li, Yunyao",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.emnlp-main.91",
pages = "1198--1205",
}

[4] Multi-Label Transfer Learning for Multi-Relational Semantic Similarity; Li Zhang, Steven R. Wilson and Rada Mihalcea; In *SEM 2019. Paper BibTeX Slides

@inproceedings{zhang-etal-2019-multi,
title = "Multi-Label Transfer Learning for Multi-Relational Semantic Similarity",
author = "Zhang, Li  and
  Wilson, Steven  and
  Mihalcea, Rada",
booktitle = "Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*{SEM} 2019)",
month = jun,
year = "2019",
address = "Minneapolis, Minnesota",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/S19-1005",
pages = "44--50",
abstract = "Multi-relational semantic similarity datasets define the semantic relations between two short texts in multiple ways, e.g., similarity, relatedness, and so on. Yet, all the systems to date designed to capture such relations target one relation at a time. We propose a multi-label transfer learning approach based on LSTM to make predictions for several relations simultaneously and aggregate the losses to update the parameters. This multi-label regression approach jointly learns the information provided by the multiple relations, rather than treating them as separate tasks. Not only does this approach outperform the single-task approach and the traditional multi-task learning approach, but it also achieves state-of-the-art performance on all but one relation of the Human Activity Phrase dataset.",
}

[3] Direct Network Transfer: Transfer Learning of Sentence Embeddings for Semantic Similarity; Li Zhang, Steven R. Wilson and Rada Mihalcea; in arXiv pre-print; presented at IC2S2 2018.Paper BibTeX Poster

@misc{zhang2018direct,
title={Direct Network Transfer: Transfer Learning of Sentence Embeddings for Semantic Similarity},
author={Li Zhang and Steven R. Wilson and Rada Mihalcea},
year={2018},
eprint={1804.07835},
archivePrefix={arXiv},
primaryClass={cs.CL}
}

[2] Entity and Event Extraction from Scratch Using Minimal Training Data; Laura Burdick, Steven R. Wilson, Oana Ignat, Charles F. Welch, Li Zhang, Mingzhe Wang, Jia Deng and Rada Mihalcea; in TAC 2018.Paper BibTeX Poster

@article{Burdick2018EntityAE,
title={Entity and Event Extraction from Scratch Using Minimal Training Data},
author={Laura Burdick and Steven R. Wilson and Oana Ignat and Charles F Welch and Li Zhang and Mingzhe Wang and Jia Deng and Rada Mihalcea},
journal={Theory and Applications of Categories},
year={2018}
}

[1] Improving Text-to-SQL Evaluation Methodology; Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik Ramanathan Dhanalakshmi Ramanathan, Sesh Sadasivam, Rui Zhang and Dragomir Radev; in ACL 2018.Paper BibTeX Repo Poster

@InProceedings{acl18sql,
  author    = {Catherine Finegan-Dollak\*  and  Jonathan K. Kummerfeld\*  and  Li Zhang  and  Karthik Ramanathan  and  Sesh Sadasivam  and  Rui Zhang  and  Dragomir Radev},
  title     = {Improving Text-to-SQL Evaluation Methodology},
  booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  shortvenue = {ACL},
  month     = {July},
  year      = {2018},
  address   = {Melbourne, Victoria, Australia},
  pages     = {351--360},
  abstract  = {To be informative, an evaluation must measure how well systems generalize to realistic unseen data. We identify limitations of and propose improvements to current evaluations of text-to-SQL systems. First, we compare human-generated and automatically generated questions, characterizing properties of queries necessary for real-world applications. To facilitate evaluation on multiple datasets, we release standardized and improved versions of seven existing datasets and one new text-to-SQL dataset. Second, we show that the current division of data into training and test sets measures robustness to variations in the way questions are asked, but only partially tests how well systems generalize to new queries; therefore, we propose a complementary dataset split for evaluation of future work. Finally, we demonstrate how the common practice of anonymizing variables during evaluation removes an important challenge of the task. Our observations highlight key difficulties, and our methodology enables effective measurement of future development.},
  url       = {http://aclweb.org/anthology/P18-1033},
  software  = {https://github.com/jkkummerfeld/text2sql-data},
  data      = {https://github.com/jkkummerfeld/text2sql-data},
}

Activities

May 7, 2025: Talk at Brown University
Apr 3, 2025: Talk at Temple University
Mar 19, 2025: Talk at Michigan State University
Mar 18, 2025: Talk at the University of Michigan
Mar 4, 2025: Talk at AAAI 2025 workshop: Towards Knowledgeable Foundation Models
Feb 27, 2025: Attending AAAI 2025
Feb 21, 2025: Talk at Allen Institute for AI
Feb 17, 2025: Talk at the University of Pennsylvania
Dec 1, 2024: Join Drexel University as an assistant professor
Oct 1, 2024: Talk at Adobe World Headquarters
Sept 24, 2024: Talk at Stony Brook University
Aug 22, 2024: Talk at Stanford University
June 21, 2024: Presenting at *SEM 2024
June 18, 2024: Attending NAACL 2024
Apr 18, 2024: Talk at the University of Pittsburgh
March 17, 2024: Presenting at EACL 2024
March 13, 2024: Talk at Drexel University
March 8, 2024: Talk at Brandeis University
March 5, 2024: Talk at University of Maryland, Baltimore County
Feb 1, 2024: Guest lecture at the University of Michigan
Jan 31, 2024: Talk at King's College London
Jan 30, 2024: Talk at the University of Michigan

Music

I am a drummer, producer and video content creator. I run a video channel with over 50,000 subscribers on Bilibili and YouTube, primarily making cover songs from video game and anime soundtracks, in a variety of styles ranging from metal to jazz. I am proudly sponsored by Vater, Tama, Mackie, Alesis, NUX, Xvive and have collaborated with manufacturers of major video games such as Genshin Impact and Azur Lane.

My videos are primarily either collaboration-work music videos or solo-work drum covers.

My new album, Megidalon, a collection of reimagined Persona soundracks, is available for streaming on all platforms! My two previous albums Dazzling Tales (reimagined Genshin Impact soundtracks) and A Doll's Lament (reimagined NieR soundtracks) are also available for listening on all major streaming platforms.

I also engage in research of AI music generation, having published a paper on automatic drum composition in an AAAI 2023 workshop.

[18] Language Models are Drummers: Drum Composition with Natural Language Pre-Training; Li Zhang and Chris Callison-Burch; in AAAI 2023 Workshop on Creative AI Across Modalities.Paper Repo BibTeX

  @InProceedings{gpt3drum,
    author    = {Li Zhang  and  Chris Callison-Burch},
    title     = {Language Models are Drummers: Drum Composition with Natural Language Pre-Training},
    venue = {AAAI 2023 1st workshop on Creative AI across Modalities},
    month     = {February},
    year      = {2023},
    address   = {Washington, D.C., USA},
    abstract  = {Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.},
    url       = {https://arxiv.org/abs/2301.01162},
    software  = {https://github.com/zharry29/drums-with-llm},
  }