Li "Harry" Zhang

Li "Harry" Zhang 张力

About Me

I am an assistant professor at Drexel University focusing on Natural Language Processing and Artificial Intelligence. I'm interested in planning and reasoning using Large Language Models. I earned my PhD (thesis) from the University of Pennsylvania, having the honor to be mentored by Prof. Chris Callison-Burch, with a thesis committee chaired by Prof. Dan Roth. I earned my BS from the University of Michigan in 2018, mentored by Prof. Rada Mihalcea and Prof. Dragomir Radev.

I offer systematic research mentorship (remote and in-person, undergraduate and graduate, paid and unpaid) to selected interns and volunteers. Past participants implemented existing ideas and published within a few months. Demonstration of prior experience to lead a complete research project (e.g., publication in NLP or other fields) is required. Motivation to apply for a PhD program at the end of 2026 is strongly preferred. Those interested should fill out this form. I cannot respond to emails on this matter. I am currently not hiring PhD students.

CV

Harry.Zhang@drexel.edu

Affiliations

Drexel University

Assistant Professor;
Dec 2024 to Present

University of Pennsylvania

Ph.D.; Aug 2019 to Aug 2024

Allen Institute for Artifical Intelligence

Research Intern;
April 2023 to Dec 2023

IBM Research

Research Intern; May 2021 to Aug 2021,
April 2019 to June 2019

University of Michigan

B.S.E.; Aug 2015 to Dec 2018

Mentorship and Teaching

Haz Lab at Drexel

Teaching

Instructor: CS T780-001: Applied NLP (Spring 2025, Spring 2026 at Drexel)

Instructor: CS 614: Applied AI (Spring 2026 at Drexel)

TA: CIS 530: Computational Linguistics (Winter, Fall 2020 at Penn)

TA: EECS 595: Natural Language Processing (Fall 2018 at Michigan)

TA: EECS 280: Programming and Introductory Data Structures (Winter, Fall 2016 at Michigan)

Service

I have reviewed more than 50 papers of and chaired for many NLP conferences and workshops.

Area Chair of ARR Feb 2025 / ACL 2025, ARR Dec 2024, COLING 2025, ARR Aug 2024, ARR Jun 2024 / EMNLP 2024, ARR Feb 2024 / ACL 2024

Session Chair of ACL 2024, AACL-IJCNLP 2020

Program Chair of MASC-SLL 2023, MASC-SLL 2021

Reviewer of LREC-COLING 2024, EMNLP 2023, ACL 2023, ARR Mar 2022, DaSH Workshop @ EMNLP 2022, COLING 2022, LREC 2022, ARR Nov 2021, COLING 2020, Computer Speech and Language 2018

Research

LLM-as-Formalizer: Executable and Trustworthy Planning and Problem-Solving

Despite recent efforts in using large language models (LLMs) to plan and solve problems as agents, their hallucinations and lack of verifiability undermine executability and trust, preventing real-world deployment. My work advances an alternative paradigm: LLM-as-formalizer. Instead of relying on LLMs to generate plans directly, we use them as a code generator to translate a user’s environment and goal into formal languages that can be deterministically solved by off-the-shelf solvers.

My primary efforts lie in using LLMs to generate formal language, such as PDDL that describes the planning environment.

[39] Language Model as Planner and Formalizer under Constraints Cassie Huang, Stuti Mohan, Ziyi Yang, Stefanie Tellex and Li Zhang; preprint.Paper BibTeX Code

@misc{huang2025languagemodelplannerformalizer,
      title={Language Model as Planner and Formalizer under Constraints}, 
      author={Cassie Huang and Stuti Mohan and Ziyi Yang and Stefanie Tellex and Li Zhang},
      year={2025},
      eprint={2510.05486},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.05486}, 
}

[37] Vision Language Models Cannot Plan, but Can They Formalize? Muyu He, Yuxi Zheng, Yuchen Liu, Zijian An, Bill Cai, Jiani Huang, Lifeng Zhou, Feng Liu, Ziyang Li and Li Zhang; preprint.Paper BibTeX Code

@misc{he2025visionlanguagemodelsplan,
      title={Vision Language Models Cannot Plan, but Can They Formalize?}, 
      author={Muyu He and Yuxi Zheng and Yuchen Liu and Zijian An and Bill Cai and Jiani Huang and Lifeng Zhou and Feng Liu and Ziyang Li and Li Zhang},
      year={2025},
      eprint={2509.21576},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.21576}, 
}

[36] Documentation Retrieval Improves Planning Language Generation; Renxiang Wang and Li Zhang; in AACL 2025.Paper BibTeX Code

@inproceedings{wang-zhang-2025-documentation,
    title = "Documentation Retrieval Improves Planning Language Generation",
    author = "Wang, Renxiang  and
      Zhang, Li",
    editor = "Inui, Kentaro  and
      Sakti, Sakriani  and
      Wang, Haofen  and
      Wong, Derek F.  and
      Bhattacharyya, Pushpak  and
      Banerjee, Biplab  and
      Ekbal, Asif  and
      Chakraborty, Tanmoy  and
      Singh, Dhirendra Pratap",
    booktitle = "Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics",
    month = dec,
    year = "2025",
    address = "Mumbai, India",
    publisher = "The Asian Federation of Natural Language Processing and The Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.ijcnlp-short.14/",
    pages = "147--158",
    ISBN = "979-8-89176-299-2",
    abstract = "Certain strong LLMs have shown promise for zero-shot formal planning by generating planning languages like PDDL. Yet, performance of most open-source models under 50B parameters has been reported to be close to zero due to the low-resource nature of these languages. We significantly improve their performance via a series of lightweight pipelines that integrates documentation retrieval with modular code generation and error refinement. With models like Llama-4-Maverick, our best pipeline improves plan correctness from 0{\%} to over 80{\%} on the common BlocksWorld domain. However, while syntactic errors are substantially reduced, semantic errors persist in more challenging domains, revealing fundamental limitations in current models' reasoning capabilities."
}

[35] Zero-Shot Iterative Formalization and Planning in Partially Observable Environments; Liancheng Gong, Wang Zhu, Jesse Thomason and Li Zhang; preprint.Paper BibTeX Code

@misc{gong2025zeroshotiterativeformalizationplanning,
  title={Zero-Shot Iterative Formalization and Planning in Partially Observable Environments}, 
  author={Liancheng Gong and Wang Zhu and Jesse Thomason and Li Zhang},
  year={2025},
  eprint={2505.13126},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2505.13126}, 
}

[34] Unifying Inference-Time Planning Language Generation ; Prabhu Prakash Kagitha, Bo Sun, Ishan Desai, Andrew Zhu, Cassie Huang, Manling Li, Ziyang Li and Li Zhang; preprint.Paper BibTeX Code

@misc{kagitha2025unifyinginferencetimeplanninglanguage,
      title={Unifying Inference-Time Planning Language Generation}, 
      author={Prabhu Prakash Kagitha and Bo Sun and Ishan Desai and Andrew Zhu and Cassie Huang and Manling Li and Ziyang Li and Li Zhang},
      year={2025},
      eprint={2505.14763},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.14763}, 
}

[33] Large Language Models as Formalizers on Constraint Satisfaction Problems; Rikhil Amonkar*Equal contribution, Ceyhun Efe Kayan*Equal contribution, May Lai, Ronan Le Bras and Li Zhang; preprint.
Paper BibTeX Code

@misc{amonkar2025naturallanguageplanningcoding,
  title={Large Language Models as Formalizers on Constraint Satisfaction Problems?}, 
  author={Rikhil Amonkar and Ceyhun Efe Kayan and May Lai and Ronan Le Bras and Li Zhang},
  year={2025},
  eprint={2505.13252},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2505.13252}, 
}

[30] On the Limit of Language Models as Planning Formalizers; Cassie Huang and Li Zhang; in ACL 2025.
Paper BibTeX Code

@inproceedings{huang-zhang-2025-limit,
    title = "On the Limit of Language Models as Planning Formalizers",
    author = "Huang, Cassie  and
      Zhang, Li",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.acl-long.242/",
    pages = "4880--4904",
    ISBN = "979-8-89176-251-0",
    abstract = "Large Language Models have been found to create plans that are neither executable nor verifiable in grounded environments. An emerging line of work demonstrates success in using the LLM as a formalizer to generate a formal representation of the planning domain in some language, such as Planning Domain Definition Language (PDDL). This formal representation can be deterministically solved to find a plan. We systematically evaluate this methodology while bridging some major gaps. While previous work only generates a partial PDDL representation, given templated, and therefore unrealistic environment descriptions, we generate the complete representation given descriptions of various naturalness levels. Among an array of observations critical to improve LLMs' formal planning abilities, we note that most large enough models can effectively formalize descriptions as PDDL, outperforming those directly generating plans, while being robust to lexical perturbation. As the descriptions become more natural-sounding, we observe a decrease in performance and provide detailed error analysis."
}

[29] PDDLEGO: Iterative Planning in Textual Environments; Li Zhang, Peter Jansen, Peter Clark, Chris Callison-Burch and Niket Tandon; in *SEM 2024.Paper BibTeX Code

@inproceedings{zhang-etal-2024-pddlego,
  title = "{PDDLEGO}: Iterative Planning in Textual Environments",
  author = "Zhang, Li  and
    Jansen, Peter  and
    Zhang, Tianyi  and
    Clark, Peter  and
    Callison-Burch, Chris  and
    Tandon, Niket",
  editor = "Bollegala, Danushka  and
    Shwartz, Vered",
  booktitle = "Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024)",
  month = jun,
  year = "2024",
  address = "Mexico City, Mexico",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2024.starsem-1.17",
  pages = "212--221",
  abstract = "Planning in textual environments have been shown to be a long-standing challenge even for current models. A recent, promising line of work uses LLMs to generate a formal representation of the environment that can be solved by a symbolic planner. However, existing methods rely on a fully-observed environment where all entity states are initially known, so a one-off representation can be constructed, leading to a complete plan. In contrast, we tackle partially-observed environments where there is initially no sufficient information to plan for the end-goal. We propose PDDLEGO that iteratively construct a planning representation that can lead to a partial plan for a given sub-goal. By accomplishing the sub-goal, more information is acquired to augment the representation, eventually achieving the end-goal. We show that plans produced by few-shot PDDLEGO are 43{\%} more efficient than generating plans end-to-end on the Coin Collector simulation, with strong performance (98{\%}) on the more complex Cooking World simulation where end-to-end LLMs fail to generate coherent plans (4{\%}).",
}

[28] PROC2PDDL: Open-Domain Planning Representations from Texts; Tianyi Zhang*Equal contribution^Mentored student, Li Zhang*Equal contribution, Zhaoyi Hou^Mentored student, Ziyu Wang^Mentored student, Yuling Gu, Peter Clark, Chris Callison-Burch and Niket Tandon; in ACL 2024 2st Workshop on Natural Language Reasoning and Structured Explanations.Paper BibTeX Code

@inproceedings{zhang-etal-2024-proc2pddl,
  title = "PROC2PDDL: Open-Domain Planning Representations from Texts",
  author = "Zhang, Tianyi and Zhang, Li  and Hou, Zhaoyi and Wang, Ziyu and Gu, Yuling and Clark, Peter and Callison-Burch, Chris and Tandon, Niket",
  booktitle = "Proceedings of the 2st Workshop on Natural Language Reasoning and Structured Explanations (NLRSE)",
  month = aug,
  year = "2024",
  address = "Bangkok, Thailand",
  publisher = "Association for Computational Linguistics",
}

[20] Faithful Chain of Thought Reasoning; Qing Lyu*Equal contribution, Shreya Havaldar*Equal contribution, Adam Stein*Equal contribution, Li Zhang, Delip Rao, Eric Wong, Marianna Apidianaki and Chris Callison-Burch; in AACL 2023. Won Area Chair Award.Paper BibTeX Code

@inproceedings{lyu-etal-2023-faithful,
  title = "Faithful Chain-of-Thought Reasoning",
  author = "Lyu, Qing  and
  Havaldar, Shreya  and
  Stein, Adam  and
  Zhang, Li  and
  Rao, Delip  and
  Wong, Eric  and
  Apidianaki, Marianna  and
  Callison-Burch, Chris",
  editor = "Park, Jong C.  and
  Arase, Yuki  and
  Hu, Baotian  and
  Lu, Wei  and
  Wijaya, Derry  and
  Purwarianti, Ayu  and
  Krisnadhi, Adila Alfa",
  booktitle = "Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)",
  month = nov,
  year = "2023",
  address = "Nusa Dua, Bali",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2023.ijcnlp-main.20",
  pages = "305--329",
}

My secondary efforts lie in using LLMs to generate solutions directly, using techniques like agents, chain-of-thought, steering...

[38] Prototype-Based Dynamic Steering for Large Language Models; Ceyhun Efe Kayan and Li Zhang; preprint.Paper BibTeX Code

@misc{kayan2025prototypebaseddynamicsteeringlarge,
      title={Prototype-Based Dynamic Steering for Large Language Models}, 
      author={Ceyhun Efe Kayan and Li Zhang},
      year={2025},
      eprint={2510.05498},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.05498}, 
}

[36] TurnaboutLLM: A Deductive Reasoning Benchmark from Detective Games; Yuan Yuan*Equal contribution, Muyu He*Equal contribution, Adil Shahid, Jiani Huang, Ziyang Li and Li Zhang; in EMNLP 2025.Paper BibTeX Code

@inproceedings{yuan-etal-2025-turnaboutllm,
    title = "{T}urnabout{LLM}: A Deductive Reasoning Benchmark from Detective Games",
    author = "Yuan, Yuan  and
      He, Muyu  and
      Shahid, Muhammad Adil  and
      Li, Ziyang  and
      Huang, Jiani  and
      Zhang, Li",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-main.101/",
    pages = "1951--1965",
    ISBN = "979-8-89176-332-6",
    abstract = "This paper introduces TurnaboutLLM, a novel framework and dataset for evaluating the deductive reasoning abilities of Large Language Models (LLMs) by leveraging the interactive gameplay of detective games Ace Attorney and Danganronpa. The framework tasks LLMs with identifying contradictions between testimonies and evidences within long narrative contexts, a challenging task due to the large answer space and diverse reasoning types presented by its questions. We evaluate twelve state-of-the-art LLMs on the dataset, hinting at limitations of popular strategies for enhancing deductive reasoning such as extensive thinking and Chain-of-Thought prompting. The results also suggest varying effects of context size, reasoning steps and answer space size on model performance. Overall, TurnaboutLLM presents a substantial challenge for LLMs' deductive reasoning abilities in complex, narrative-rich environments."
}

My other efforts touch on all aspects of NLP and AI.

[40] Evaluating the Impact of LLM-guided Reflection on Learning Outcomes with Interactive AI-Generated Educational Podcasts; Vishnu Menon, Andy Cherney, Elizabeth B. Cloude, Li Zhang, Tiffany Diem Do; in AIME-Con 2025.Paper BibTeX

@inproceedings{menon-etal-2025-evaluating,
    title = "Evaluating the Impact of {LLM}-guided Reflection on Learning Outcomes with Interactive {AI}-Generated Educational Podcasts",
    author = "Menon, Vishnu  and
      Cherney, Andy  and
      Cloude, Elizabeth B.  and
      Zhang, Li  and
      Do, Tiffany Diem",
    editor = "Wilson, Joshua  and
      Ormerod, Christopher  and
      Beiting Parrish, Magdalen",
    booktitle = "Proceedings of the Artificial Intelligence in Measurement and Education Conference (AIME-Con): Full Papers",
    month = oct,
    year = "2025",
    address = "Wyndham Grand Pittsburgh, Downtown, Pittsburgh, Pennsylvania, United States",
    publisher = "National Council on Measurement in Education (NCME)",
    url = "https://aclanthology.org/2025.aimecon-main.11/",
    pages = "99--106",
    ISBN = "979-8-218-84228-4",
    abstract = "This study examined whether embedding LLM-guided reflection prompts in an interactive AI-generated podcast improved learning and user experience compared to a version without prompts. Thirty-six undergraduates participated, and while learning outcomes were similar across conditions, reflection prompts reduced perceived attractiveness, highlighting a call for more research on reflective interactivity design."
}

[31] DynaCode: A Dynamic Complexity-Aware Code Benchmark for Evaluating Large Language Models in Code Generation; Wenhao Hu, Jinhao Duan, Chunchen Wei, Li Zhang, Yue Zhang, Kaidi Xu; in Findings of ACL 2025.Paper BibTeX

@misc{hu2025dynacodedynamiccomplexityawarecode,
  title={DynaCode: A Dynamic Complexity-Aware Code Benchmark for Evaluating Large Language Models in Code Generation}, 
  author={Wenhao Hu and Jinhao Duan and Chunchen Wei and Li Zhang and Yue Zhang and Kaidi Xu},
  year={2025},
  eprint={2503.10452},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2503.10452}, 
}

[27] Calibrating Large Language Models with Sample Consistency; Qing Lyu*Equal contribution, Kumar Shridhar*Equal contribution, Chaitanya Malaviya, Li Zhang, Yanai Elazar, Niket Tandon, Marianna Apidianaki, Mrinmaya Sachan and Chris Callison-Burch; in AAAI 2025.Paper BibTeX

@inproceedings{lyu2025calibrating,
  title={Calibrating large language models with sample consistency},
  author={Lyu, Qing and Shridhar, Kumar and Malaviya, Chaitanya and Zhang, Li and Elazar, Yanai and Tandon, Niket and Apidianaki, Marianna and Sachan, Mrinmaya and Callison-Burch, Chris},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={18},
  pages={19260--19268},
  year={2025}
}

[26] Tailoring with Targeted Precision: Edit-Based Agents for Open-Domain Procedure Customization; Yash Kumar Lal, Li Zhang, Faeze Brahman, Bodhisattwa Prasad Majumder, Peter Clark, Niket Tandon; in Findings of ACL 2024.Paper BibTeX

@inproceedings{lal-etal-2024-tailoring,
  title = "Tailoring with Targeted Precision: Edit-Based Agents for Open-Domain Procedure Customization",
  author = "Lal, Yash Kumar  and
    Zhang, Li  and
    Brahman, Faeze  and
    Majumder, Bodhisattwa Prasad  and
    Clark, Peter  and
    Tandon, Niket",
  editor = "Ku, Lun-Wei  and
    Martins, Andre  and
    Srikumar, Vivek",
  booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
  month = aug,
  year = "2024",
  address = "Bangkok, Thailand",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2024.findings-acl.921/",
  doi = "10.18653/v1/2024.findings-acl.921",
  pages = "15597--15611",
  abstract = "How-to procedures, such as how to plant a garden, are now used by millions of users, but sometimes need customizing to meet a user`s specific needs, e.g., planting a garden without pesticides. Our goal is to measure and improve an LLM`s ability to perform such customization. Our approach is to test several simple multi-LLM-agent architectures for customization, as well as an end-to-end LLM, using a new evaluation set, called CustomPlans, of over 200 WikiHow procedures each with a customization need. We find that a simple architecture with two LLM agents used sequentially performs best, one that edits a generic how-to procedure and one that verifies its executability, significantly outperforming (10.5{\%} absolute) an end-to-end prompted LLM. This suggests that LLMs can be configured reasonably effectively for procedure customization. This also suggests that multi-agent editing architectures may be worth exploring further for other customization applications (e.g. coding, creative writing) in the future."
}

[25] CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization; Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Peter Jansen, Oyvind Tafjord, Niket Tandon, Li Zhang, Chris Callison-Burch, Peter Clark; in COLM 2024.Paper BibTeX Code

@misc{majumder2023clin,
  title={CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization}, 
  author={Bodhisattwa Prasad Majumder and Bhavana Dalvi Mishra and Peter Jansen and Oyvind Tafjord and Niket Tandon and Li Zhang and Chris Callison-Burch and Peter Clark},
  year={2023},
  eprint={2310.10134},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

[24] Choice-75: A Dataset on Decision Branching in Script Learning; Zhaoyi Joey Hou^Mentored student, Li Zhang, Chris Callison-Burch; in LREC-COLING 2024.Paper BibTeX Code

@inproceedings{hou-etal-2024-choice-75,
  title = "Choice-75: A Dataset on Decision Branching in Script Learning",
  author = "Hou, Zhaoyi  and
    Zhang, Li  and
    Callison-Burch, Chris",
  editor = "Calzolari, Nicoletta  and
    Kan, Min-Yen  and
    Hoste, Veronique  and
    Lenci, Alessandro  and
    Sakti, Sakriani  and
    Xue, Nianwen",
  booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
  month = may,
  year = "2024",
  address = "Torino, Italia",
  publisher = "ELRA and ICCL",
  url = "https://aclanthology.org/2024.lrec-main.285",
  pages = "3215--3223",
  abstract = "Script learning studies how daily events unfold. It enables machines to reason about narratives with implicit information. Previous works mainly consider a script as a linear sequence of events while ignoring the potential branches that arise due to people{'}s circumstantial choices. We hence propose Choice-75, the first benchmark that challenges intelligent systems to make decisions given descriptive scenarios, containing 75 scripts and more than 600 scenarios. We also present preliminary results with current large language models (LLM). Although they demonstrate overall decent performances, there is still notable headroom in hard scenarios.",
}

[23] OpenPI2.0: An Improved Dataset for Entity Tracking in Texts; Li Zhang, Hainiu Xu^Mentored student, Abhinav Kommula, Chris Callison-Burch and Niket Tandon; in EACL 2024.Paper BibTeX Code

@inproceedings{zhang-etal-2024-openpi2,
  title = "{O}pen{PI}2.0: An Improved Dataset for Entity Tracking in Texts",
  author = "Zhang, Li  and
    Xu, Hainiu  and
    Kommula, Abhinav  and
    Callison-Burch, Chris  and
    Tandon, Niket",
  editor = "Graham, Yvette  and
    Purver, Matthew",
  booktitle = "Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)",
  month = mar,
  year = "2024",
  address = "St. Julian{'}s, Malta",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2024.eacl-long.10",
  pages = "166--178",
  abstract = "Much texts describe a changing world (e.g., procedures, stories, newswires), and understanding them requires tracking how entities change. An earlier dataset, OpenPI, provided crowdsourced annotations of entity state changes in text. However, a major limitation was that those annotations were free-form and did not identify salient changes, hampering model evaluation. To overcome these limitations, we present an improved dataset, OpenPI2.0, where entities and attributes are fully canonicalized and additional entity salience annotations are added. On our fairer evaluation setting, we find that current state-of-the-art language models are far from competent. We also show that using state changes of salient entities as a chain-of-thought prompt, downstream performance is improved on tasks such as question answering and classical planning, outperforming the setting involving all related entities indiscriminately. We offer OpenPI2.0 for the continued development of models that can understand the dynamics of entities in text.",
}

[22] Exploring the Curious Case of Code Prompts; Li Zhang*Equal contribution, Liam Dugan*Equal contribution, Hainiu Xu^Mentored student*Equal contribution and Chris Callison-Burch; in ACL 2023 1st Workshop on Natural Language Reasoning and Structured Explanations.Paper BibTeX Code

@inproceedings{zhang-etal-2023-exploring,
title = "Exploring the Curious Case of Code Prompts",
author = "Zhang, Li  and
Dugan, Liam  and
Xu, Hainiu  and
Callison-burch, Chris",
booktitle = "Proceedings of the 1st Workshop on Natural Language Reasoning and Structured Explanations (NLRSE)",
month = jun,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.nlrse-1.2",
pages = "9--17",
abstract = "Recent work has shown that prompting language models with code-like representations of natural language leads to performance improvements on structured reasoning tasks. However, such tasks comprise only a small subset of all natural language tasks. In our work, we seek to answer whether or not code-prompting is the preferred way of interacting with language models in general. We compare code and text prompts across three popular GPT models (davinci, code-davinci-002, and text-davinci-002) on a broader selection of tasks (e.g., QA, sentiment, summarization) and find that with few exceptions, code prompts do not consistently outperform text prompts. Furthermore, we show that the style of code prompt has a large effect on performance for some (but not all) tasks and that fine-tuning on text instructions leads to better relative performance of code prompts.",
}

[21] Human-in-the-Loop Schema Induction; Tianyi Zhang^Mentored student, Isaac Tham, Zhaoyi Hou^Mentored student, Jiaxuan Ren, Liyang Zhou^Mentored student, Hainiu Xu^Mentored student, Li Zhang, Lara J. Martin, Rotem Dror, Sha Li, Heng Ji, Martha Palmer, Susan Brown, Reece Suchocki, Chris Callison-Burch; in ACL 2023 Demos.Paper BibTeX Demo

@inproceedings{zhang-etal-2023-human,
  title = "Human-in-the-loop Schema Induction",
  author = "Zhang, Tianyi  and
    Tham, Isaac  and
    Hou, Zhaoyi  and
    Ren, Jiaxuan  and
    Zhou, Leon  and
    Xu, Hainiu  and
    Zhang, Li  and
    Martin, Lara  and
    Dror, Rotem  and
    Li, Sha  and
    Ji, Heng  and
    Palmer, Martha  and
    Brown, Susan Windisch  and
    Suchocki, Reece  and
    Callison-Burch, Chris",
  booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)",
  month = jul,
  year = "2023",
  address = "Toronto, Canada",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2023.acl-demo.1",
  pages = "1--10",
  abstract = "Schema induction builds a graph representation explaining how events unfold in a scenario. Existing approaches have been based on information retrieval (IR) and information extraction (IE), often with limited human curation. We demonstrate a human-in-the-loop schema induction system powered by GPT-3. We first describe the different modules of our system, including prompting to generate schematic elements, manual edit of those elements, and conversion of those into a schema graph. By qualitatively comparing our system to previous ones, we show that our system not only transfers to new domains more easily than previous approaches, but also reduces efforts of human curation thanks to our interactive interface.",
}

[19] Causal Reasoning of Entities and Events in Procedural Texts; Li Zhang*Equal contribution, Hainiu Xu*Equal contribution^Mentored student, Yue Yang, Shuyan Zhou, Weiqiu You, Manni Arora and Chris Callison-Burch; in Findings of EACL 2023.Paper BibTeX Code

@inproceedings{zhang-etal-2023-causal,
  title = "Causal Reasoning of Entities and Events in Procedural Texts",
  author = "Zhang, Li  and
    Xu, Hainiu  and
    Yang, Yue  and
    Zhou, Shuyan  and
    You, Weiqiu  and
    Arora, Manni  and
    Callison-burch, Chris",
  booktitle = "Findings of the Association for Computational Linguistics: EACL 2023",
  month = may,
  year = "2023",
  address = "Dubrovnik, Croatia",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2023.findings-eacl.31",
  pages = "415--431",
  abstract = "Entities and events are crucial to natural language reasoning and common in procedural texts. Existing work has focused either exclusively on entity state tracking (e.g., whether a pan is hot) or on event reasoning (e.g., whether one would burn themselves by touching the pan), while these two tasks are often causally related. We propose CREPE, the first benchmark on causal reasoning of event plausibility and entity states. We show that most language models, including GPT-3, perform close to chance at .35 F1, lagging far behind human at .87 F1. We boost model performance to .59 F1 by creatively representing events as programming languages while prompting language models pretrained on code. By injecting the causal relations between entities and events as intermediate reasoning steps in our representation, we further boost the performance to .67 F1. Our findings indicate not only the challenge that CREPE brings for language models, but also the efficacy of code-like prompting combined with chain-of-thought prompting for multihop event reasoning.",
}

[17] Unsupervised Entity Linking with Guided Summarization and Multiple Choice Selection; Young Min Cho^Mentored student, Li Zhang and Chris Callison-Burch; in EMNLP 2022.Paper BibTeX Code

@inproceedings{cho-etal-2022-unsupervised,
title = "Unsupervised Entity Linking with Guided Summarization and Multiple-Choice Selection",
author = "Cho, Young Min  and
Zhang, Li  and
Callison-Burch, Chris",
booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2022",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.emnlp-main.638",
pages = "9394--9401",
abstract = "Entity linking, the task of linking potentially ambiguous mentions in texts to corresponding knowledge-base entities, is an important component for language understanding. We address two challenge in entity linking: how to leverage wider contexts surrounding a mention, and how to deal with limited training data. We propose a fully unsupervised model called SumMC that first generates a guided summary of the contexts conditioning on the mention, and then casts the task to a multiple-choice problem where the model chooses an entity from a list of candidates. In addition to evaluating our model on existing datasets that focus on named entities, we create a new dataset that links noun phrases from WikiHow to Wikidata. We show that our SumMC model achieves state-of-the-art unsupervised performance on our new dataset and on exiting datasets.",
}

[16] GEMv2: Multilingual NLG Benchmarking in a Single Line of Code; ... Li Zhang, Huaiyu Zhu, Siddhartha Brahma, Yunyao Li, ...; in EMNLP 2022 Demos.Paper

[15] Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models; ... Li Zhang*Equal contribution, Qing Lyu*Equal contribution and Chris Callison-Burch; in TMLR.Paper

[13] QuakerBot: A Household Dialog System Powered by Large Language Models; Artemis Panagopoulou, Manni Arora^Mentored student, Li Zhang, Dimitri Cugini, Weiqiu You, Yue Yang, Liyang Zhou^Mentored student, Yuxuan Wang, Zhaoyi Hou^Mentored student, Alyssa Hwang, Lara Martin, Sherry Shi, Chris Callison-Burch and Mark Yatskar; in Alexa Prize Proceedings 2022.Paper BibTeX

@Inproceedings{Pennsylvania2022,
author = {Panagopoulou, Artemis and Arora, Manni and Zhang, Li and Cugini, Dimitri and You, Weiqiu and Yang, Yue and Zhou, Liyang and Wang, Yuxuan and Hou, Zhaoyi and Hwang, Alyssa and Martin, Lara and Shi, Sherry and Callison-Burch, Chris and Yatskar, Mark},
title = {QuakerBot: A household dialog system powered by large language models},
year = {2022},
url = {https://www.amazon.science/alexa-prize/proceedings/quakerbot-a-household-dialog-system-powered-by-large-language-models},
booktitle = {Alexa Prize TaskBot Challenge Proceedings},
}

[12] Is "my favorite new movie" my favorite movie? Probing the Understanding of Recursive Noun Phrases; Qing Lyu, Hua Zheng, Daoxin Li, Li Zhang, Marianna Apidianaki and Chris Callison-Burch; in NAACL 2022.Paper BibTeX Code

@inproceedings{lyu-etal-2022-favorite,
    title = "Is {``}My Favorite New Movie{''} My Favorite Movie? Probing the Understanding of Recursive Noun Phrases",
    author = "Lyu, Qing  and
      Hua, Zheng  and
      Li, Daoxin  and
      Zhang, Li  and
      Apidianaki, Marianna  and
      Callison-Burch, Chris",
    booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jul,
    year = "2022",
    address = "Seattle, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.naacl-main.388",
    pages = "5286--5302",
    abstract = "Recursive noun phrases (NPs) have interesting semantic properties. For example, {``}my favorite new movie{''} is not necessarily my favorite movie, whereas {``}my new favorite movie{''} is. This is common sense to humans, yet it is unknown whether language models have such knowledge. We introduce the Recursive Noun Phrase Challenge (RNPC), a dataset of three textual inference tasks involving textual entailment and event plausibility comparison, precisely targeting the understanding of recursive NPs. When evaluated on RNPC, state-of-the-art Transformer models only perform around chance. Still, we show that such knowledge is learnable with appropriate data. We further probe the models for relevant linguistic features that can be learned from our tasks, including modifier semantic category and modifier scope. Finally, models trained on RNPC achieve strong zero-shot performance on an extrinsic Harm Detection evaluation task, showing the usefulness of the understanding of recursive NPs in downstream applications.",
}

[11] Label Definitions Improve Semantic Role Labeling; Li Zhang, Ishan Jindal, Yunyao Li; in NAACL 2022.Paper BibTeX Code

@inproceedings{zhang-etal-2022-label-definitions,
    title = "Label Definitions Improve Semantic Role Labeling",
    author = "Zhang, Li  and
      Jindal, Ishan  and
      Li, Yunyao",
    booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jul,
    year = "2022",
    address = "Seattle, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.naacl-main.411",
    pages = "5613--5620",
    abstract = "Argument classification is at the core of Semantic Role Labeling. Given a sentence and the predicate, a semantic role label is assigned to each argument of the predicate. While semantic roles come with meaningful definitions, existing work has treated them as symbolic. Learning symbolic labels usually requires ample training data, which is frequently unavailable due to the cost of annotation. We instead propose to retrieve and leverage the definitions of these labels from the annotation guidelines. For example, the verb predicate {``}work{''} has arguments defined as {``}worker{''}, {``}job{''}, {``}employer{''}, etc. Our model achieves state-of-the-art performance on the CoNLL09 dataset injected with label definitions given the predicate senses. The performance improvement is even more pronounced in low-resource settings when training data is scarce.",
}

[10] Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data; Shuyan Zhou*Equal contribution, Li Zhang*Equal contribution, Yue Yang, Qing Lyu, Pengcheng Yin, Chris Callison-Burch and Graham Neubig; in ACL 2022.Paper BibTeX Demo Code

@inproceedings{zhou-etal-2022-show,
title = "Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data",
author = "Zhou, Shuyan  and
  Zhang, Li  and
  Yang, Yue  and
  Lyu, Qing  and
  Yin, Pengcheng  and
  Callison-Burch, Chris  and
  Neubig, Graham",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-long.214",
pages = "2998--3012",
abstract = "Procedures are inherently hierarchical. To {``}make videos{''}, one may need to {``}purchase a camera{''}, which in turn may require one to {``}set a budget{''}. While such hierarchical knowledge is critical for reasoning about complex procedures, most existing work has treated procedures as shallow structures without modeling the parent-child relation. In this work, we attempt to construct an open-domain hierarchical knowledge-base (KB) of procedures based on wikiHow, a website containing more than 110k instructional articles, each documenting the steps to carry out a complex procedure. To this end, we develop a simple and efficient method that links steps (e.g., {``}purchase a camera{''}) in an article to other articles with similar goals (e.g., {``}how to choose a camera{''}), recursively constructing the KB. Our method significantly outperforms several strong baselines according to automatic evaluation, human judgment, and application to downstream tasks such as instructional video retrieval.",
}

[9] Visual Goal-Step Inference using wikiHow; Yue Yang, Artemis Panagopoulou, Qing Lyu, Li Zhang, Mark Yatskar and Chris Callison-Burch; In EMNLP 2021.Paper BibTeX

@inproceedings{yang-etal-2021-visual,
title = "Visual Goal-Step Inference using wiki{H}ow",
author = "Yang, Yue  and
  Panagopoulou, Artemis  and
  Lyu, Qing  and
  Zhang, Li  and
  Yatskar, Mark  and
  Callison-Burch, Chris",
booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2021",
address = "Online and Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.emnlp-main.165",
pages = "2167--2179",
abstract = "Understanding what sequence of steps are needed to complete a goal can help artificial intelligence systems reason about human activities. Past work in NLP has examined the task of goal-step inference for text. We introduce the visual analogue. We propose the Visual Goal-Step Inference (VGSI) task, where a model is given a textual goal and must choose which of four images represents a plausible step towards that goal. With a new dataset harvested from wikiHow consisting of 772,277 images representing human actions, we show that our task is challenging for state-of-the-art multimodal models. Moreover, the multimodal representation learned from our data can be effectively transferred to other datasets like HowTo100m, increasing the VGSI accuracy by 15 - 20{\%}. Our task will facilitate multimodal reasoning about procedural events.",
}

[8] Goal-Oriented Script Construction; Qing Lyu*Equal contribution, Li Zhang*Equal contribution and Chris Callison-Burch; in INLG 2021.Paper BibTeX Code

@inproceedings{lyu-etal-2021-goal,
title = "Goal-Oriented Script Construction",
author = "Lyu, Qing  and
Zhang, Li  and
Callison-Burch, Chris",
booktitle = "Proceedings of the 14th International Conference on Natural Language Generation",
month = aug,
year = "2021",
address = "Aberdeen, Scotland, UK",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.inlg-1.19",
pages = "184--200",
abstract = "The knowledge of scripts, common chains of events in stereotypical scenarios, is a valuable asset for task-oriented natural language understanding systems. We propose the Goal-Oriented Script Construction task, where a model produces a sequence of steps to accomplish a given goal. We pilot our task on the first multilingual script learning dataset supporting 18 languages collected from wikiHow, a website containing half a million how-to articles. For baselines, we consider both a generation-based approach using a language model and a retrieval-based approach by first retrieving the relevant steps from a large candidate pool and then ordering them. We show that our task is practical, feasible but challenging for state-of-the-art Transformer models, and that our methods can be readily deployed for various other datasets and domains with decent zero-shot performance.",
}

[7] Intent Detection with WikiHow; Li Zhang, Qing Lyu, Chris Callison-Burch; in AACL-IJCNLP 2020.Paper BibTeX Code

@inproceedings{zhang-etal-2020-intent,
  title = "Intent Detection with {W}iki{H}ow",
  author = "Zhang, Li  and
    Lyu, Qing  and
    Callison-Burch, Chris",
  booktitle = "Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing",
  month = dec,
  year = "2020",
  address = "Suzhou, China",
  publisher = "Association for Computational Linguistics",
  url = "https://www.aclweb.org/anthology/2020.aacl-main.35",
  pages = "328--333",
  abstract = "Modern task-oriented dialog systems need to reliably understand users{'} intents. Intent detection is even more challenging when moving to new domains or new languages, since there is little annotated data. To address this challenge, we present a suite of pretrained intent detection models which can predict a broad range of intended goals from many actions because they are trained on wikiHow, a comprehensive instructional website. Our models achieve state-of-the-art results on the Snips dataset, the Schema-Guided Dialogue dataset, and all 3 languages of the Facebook multilingual dialog datasets. Our models also demonstrate strong zero- and few-shot performance, reaching over 75{\%} accuracy using only 100 training examples in all datasets.",
}

[6] Reasoning about Goals, Steps, and Temporal Ordering with WikiHow; Li Zhang*Equal contribution, Qing Lyu*Equal contribution and Chris Callison-Burch; in EMNLP 2020.Paper BibTeX Code

@inproceedings{zhang-etal-2020-reasoning,
  title = "Reasoning about Goals, Steps, and Temporal Ordering with {W}iki{H}ow",
  author = "Zhang, Li  and
    Lyu, Qing  and
    Callison-Burch, Chris",
  booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
  month = nov,
  year = "2020",
  address = "Online",
  publisher = "Association for Computational Linguistics",
  url = "https://www.aclweb.org/anthology/2020.emnlp-main.374",
  pages = "4630--4639",
}

[5] Small but Mighty: New Benchmarks for Split and Rephrase; Li Zhang, Huaiyu Zhu, Siddhartha Brahma and Yunyao Li; in EMNLP 2020; a part of the GEM Benchmark.Paper BibTeX Code

@inproceedings{zhang-etal-2020-small,
title = "Small but Mighty: New Benchmarks for Split and Rephrase",
author = "Zhang, Li  and
  Zhu, Huaiyu  and
  Brahma, Siddhartha  and
  Li, Yunyao",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.emnlp-main.91",
pages = "1198--1205",
}

[4] Multi-Label Transfer Learning for Multi-Relational Semantic Similarity; Li Zhang, Steven R. Wilson and Rada Mihalcea; In *SEM 2019. Paper BibTeX Slides

@inproceedings{zhang-etal-2019-multi,
title = "Multi-Label Transfer Learning for Multi-Relational Semantic Similarity",
author = "Zhang, Li  and
  Wilson, Steven  and
  Mihalcea, Rada",
booktitle = "Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*{SEM} 2019)",
month = jun,
year = "2019",
address = "Minneapolis, Minnesota",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/S19-1005",
pages = "44--50",
abstract = "Multi-relational semantic similarity datasets define the semantic relations between two short texts in multiple ways, e.g., similarity, relatedness, and so on. Yet, all the systems to date designed to capture such relations target one relation at a time. We propose a multi-label transfer learning approach based on LSTM to make predictions for several relations simultaneously and aggregate the losses to update the parameters. This multi-label regression approach jointly learns the information provided by the multiple relations, rather than treating them as separate tasks. Not only does this approach outperform the single-task approach and the traditional multi-task learning approach, but it also achieves state-of-the-art performance on all but one relation of the Human Activity Phrase dataset.",
}

[3] Direct Network Transfer: Transfer Learning of Sentence Embeddings for Semantic Similarity; Li Zhang, Steven R. Wilson and Rada Mihalcea; preprint pre-print; presented at IC2S2 2018.Paper BibTeX Poster

@misc{zhang2018direct,
title={Direct Network Transfer: Transfer Learning of Sentence Embeddings for Semantic Similarity},
author={Li Zhang and Steven R. Wilson and Rada Mihalcea},
year={2018},
eprint={1804.07835},
archivePrefix={arXiv},
primaryClass={cs.CL}
}

[2] Entity and Event Extraction from Scratch Using Minimal Training Data; Laura Burdick, Steven R. Wilson, Oana Ignat, Charles F. Welch, Li Zhang, Mingzhe Wang, Jia Deng and Rada Mihalcea; in TAC 2018.Paper BibTeX Poster

@article{Burdick2018EntityAE,
title={Entity and Event Extraction from Scratch Using Minimal Training Data},
author={Laura Burdick and Steven R. Wilson and Oana Ignat and Charles F Welch and Li Zhang and Mingzhe Wang and Jia Deng and Rada Mihalcea},
journal={Theory and Applications of Categories},
year={2018}
}

[1] Improving Text-to-SQL Evaluation Methodology; Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik Ramanathan Dhanalakshmi Ramanathan, Sesh Sadasivam, Rui Zhang and Dragomir Radev; in ACL 2018.Paper BibTeX Code Poster

@InProceedings{acl18sql,
  author    = {Catherine Finegan-Dollak\*  and  Jonathan K. Kummerfeld\*  and  Li Zhang  and  Karthik Ramanathan  and  Sesh Sadasivam  and  Rui Zhang  and  Dragomir Radev},
  title     = {Improving Text-to-SQL Evaluation Methodology},
  booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  shortvenue = {ACL},
  month     = {July},
  year      = {2018},
  address   = {Melbourne, Victoria, Australia},
  pages     = {351--360},
  abstract  = {To be informative, an evaluation must measure how well systems generalize to realistic unseen data. We identify limitations of and propose improvements to current evaluations of text-to-SQL systems. First, we compare human-generated and automatically generated questions, characterizing properties of queries necessary for real-world applications. To facilitate evaluation on multiple datasets, we release standardized and improved versions of seven existing datasets and one new text-to-SQL dataset. Second, we show that the current division of data into training and test sets measures robustness to variations in the way questions are asked, but only partially tests how well systems generalize to new queries; therefore, we propose a complementary dataset split for evaluation of future work. Finally, we demonstrate how the common practice of anonymizing variables during evaluation removes an important challenge of the task. Our observations highlight key difficulties, and our methodology enables effective measurement of future development.},
  url       = {http://aclweb.org/anthology/P18-1033},
  software  = {https://github.com/jkkummerfeld/text2sql-data},
  data      = {https://github.com/jkkummerfeld/text2sql-data},
}

Activities

Mar 11, 2026: Talk at Northwestern University
Feb 13, 2026: Talk at Allen Institute for AI
Nov 3, 2025: Accepted to AAAI 2026 New Faculty Highlights
May 7, 2025: Talk at Brown University
Apr 3, 2025: Talk at Temple University
Mar 19, 2025: Talk at Michigan State University
Mar 18, 2025: Talk at the University of Michigan
Mar 4, 2025: Talk at AAAI 2025 workshop: Towards Knowledgeable Foundation Models
Feb 27, 2025: Attending AAAI 2025
Feb 21, 2025: Talk at Allen Institute for AI
Feb 17, 2025: Talk at the University of Pennsylvania
Dec 1, 2024: Join Drexel University as an assistant professor

Music

I am a drummer, producer, content creator, and band leader. I run a video channel with over 60,000 subscribers on Bilibili and YouTube, primarily making cover songs from video game and anime soundtracks, in a variety of styles ranging from metal to jazz. I am proudly sponsored by Mackie, NUX, Xvive officially and Chinese retailers of Roland, Alesis, Vater, Tama, etc. I have collaborated with major video games such as Genshin Impact and Azur Lane.

Below are some music videos and drum covers that I played in and produced.

Below are recorded live performances of the bands I run locally in Philadelphia.

My new album, Megidalon, a collection of reimagined Persona soundracks, is available for streaming on all platforms! My two previous albums Dazzling Tales (reimagined Genshin Impact soundtracks) and A Doll's Lament (reimagined NieR soundtracks) are also available for listening on all major streaming platforms.

I also engage in research of AI music generation, having published a paper on automatic drum composition in an AAAI 2023 workshop.

[32] Not that Groove: Zero-Shot Symbolic Music Editing; Li Zhang; preprint.Paper Code BibTeX

@misc{zhang2025groovezeroshotsymbolicmusic,
      title={Not that Groove: Zero-Shot Symbolic Music Editing}, 
      author={Li Zhang},
      year={2025},
      eprint={2505.08203},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2505.08203}, 
}

[18] Language Models are Drummers: Drum Composition with Natural Language Pre-Training; Li Zhang and Chris Callison-Burch; in AAAI 2023 Workshop on Creative AI Across Modalities.Paper Code BibTeX

  @InProceedings{gpt3drum,
    author    = {Li Zhang  and  Chris Callison-Burch},
    title     = {Language Models are Drummers: Drum Composition with Natural Language Pre-Training},
    venue = {AAAI 2023 1st workshop on Creative AI across Modalities},
    month     = {February},
    year      = {2023},
    address   = {Washington, D.C., USA},
    abstract  = {Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.},
    url       = {https://arxiv.org/abs/2301.01162},
    software  = {https://github.com/zharry29/drums-with-llm},
  }