About Me
I am a second-year PhD student focusing on Natural Language Processing, working with Prof. Chris Callison-Burch at the University of Pennsylvania. I graduated from the University of Michigan in 2018, previously mentored by Prof. Rada Mihalcea and Prof. Dragomir Radev. I'm also a competitive pool player and a passionate amateur multi-instrumental musician! I love playing metal, rock, funk and fusion.
CV PublicationsUniversity of Pennsylvania
Shenzhen, China
zharry@seas.upenn.edu
Music
I play, record and produce music. Sometimes I play multiple instruments myself; sometimes I play with my talented friends. More of my work can be found on Bilibili.
I'm an intermediate-to-advanced drummer. I play a Roland TD-30K electronic set with Tama Speed Cobra 910 pedals.
I'm also a beginner-to-intermediate guitarist and bassist. I play a Schecter C-7 FR-S Apocalypse electric guitar and an Ibanez GSRM20 Mikro electric bass.
Pool
I'm a decent pool player. I used to play competitively in the university team and played in intercollegiate tournaments regularly. I play with a Predator SP2 REVO play cue and a Mezz Dual Force break & jump cue.
This work is a part of the DARPA AIDA project. From the texts, audios and videos recounting the Russia-Ukraine conflict in 2014, the goal is to extract knowledge elements and generate hypotheses about real-life events. I used named entity recognition, keyword extraction and word embeddings to extract textual entities from the data and assign them with categories from the given ontology.
In each volume of the New Yorker magazine, there is a comic section where thousands of readers submit funny captions. The goal is to automatically divide them into clusters based on their theme of humor (what they are joking about) using unsupervised learning. Work had been done years ago but the codes were scattered and underdocumented. I as a freshman was in charge of this project, to bring the existing system up to date and to make optimization.
AAN encompases our corpus of resources on NLP and related fields and the research projects which build upon this corpus. We have collected around 6,500 surveys, tutorials and other resources and created a search engine which allows users to easily browse these resources. I helped build and maintain this power anthology with information regarding numerous papers included in top NLP venues. It features paper citation, author citation, and author collaboration, etc.
Paper BibTeX Repo In AACL-IJCNLP 2020.
@inproceedings{zhang-etal-2020-intent, title = "Intent Detection with {W}iki{H}ow", author = "Zhang, Li and Lyu, Qing and Callison-Burch, Chris", booktitle = "Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing", month = dec, year = "2020", address = "Suzhou, China", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.aacl-main.35", pages = "328--333", abstract = "Modern task-oriented dialog systems need to reliably understand users{'} intents. Intent detection is even more challenging when moving to new domains or new languages, since there is little annotated data. To address this challenge, we present a suite of pretrained intent detection models which can predict a broad range of intended goals from many actions because they are trained on wikiHow, a comprehensive instructional website. Our models achieve state-of-the-art results on the Snips dataset, the Schema-Guided Dialogue dataset, and all 3 languages of the Facebook multilingual dialog datasets. Our models also demonstrate strong zero- and few-shot performance, reaching over 75{\%} accuracy using only 100 training examples in all datasets.", }
Paper BibTeX Repo In EMNLP 2020.
@inproceedings{zhang-etal-2020-reasoning, title = "Reasoning about Goals, Steps, and Temporal Ordering with {W}iki{H}ow", author = "Zhang, Li and Lyu, Qing and Callison-Burch, Chris", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.emnlp-main.374", pages = "4630--4639", }
Paper BibTeX In EMNLP 2020.
@inproceedings{zhang-etal-2020-small, title = "Small but Mighty: New Benchmarks for Split and Rephrase", author = "Zhang, Li and Zhu, Huaiyu and Brahma, Siddhartha and Li, Yunyao", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.emnlp-main.91", pages = "1198--1205", }
Paper BibTeX Slides In *SEM 2019.
@inproceedings{zhang-etal-2019-multi, title = "Multi-Label Transfer Learning for Multi-Relational Semantic Similarity", author = "Zhang, Li and Wilson, Steven and Mihalcea, Rada", booktitle = "Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*{SEM} 2019)", month = jun, year = "2019", address = "Minneapolis, Minnesota", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/S19-1005", pages = "44--50", abstract = "Multi-relational semantic similarity datasets define the semantic relations between two short texts in multiple ways, e.g., similarity, relatedness, and so on. Yet, all the systems to date designed to capture such relations target one relation at a time. We propose a multi-label transfer learning approach based on LSTM to make predictions for several relations simultaneously and aggregate the losses to update the parameters. This multi-label regression approach jointly learns the information provided by the multiple relations, rather than treating them as separate tasks. Not only does this approach outperform the single-task approach and the traditional multi-task learning approach, but it also achieves state-of-the-art performance on all but one relation of the Human Activity Phrase dataset.", }
Paper BibTeX Poster In arXiv pre-print; presented at IC2S2 2018.
@misc{zhang2018direct, title={Direct Network Transfer: Transfer Learning of Sentence Embeddings for Semantic Similarity}, author={Li Zhang and Steven R. Wilson and Rada Mihalcea}, year={2018}, eprint={1804.07835}, archivePrefix={arXiv}, primaryClass={cs.CL} }
Paper BibTeX Repo Poster In ACL 2018.
@InProceedings{acl18sql, author = {Catherine Finegan-Dollak\* and Jonathan K. Kummerfeld\* and Li Zhang and Karthik Ramanathan and Sesh Sadasivam and Rui Zhang and Dragomir Radev}, title = {Improving Text-to-SQL Evaluation Methodology}, booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, shortvenue = {ACL}, month = {July}, year = {2018}, address = {Melbourne, Victoria, Australia}, pages = {351--360}, abstract = {To be informative, an evaluation must measure how well systems generalize to realistic unseen data. We identify limitations of and propose improvements to current evaluations of text-to-SQL systems. First, we compare human-generated and automatically generated questions, characterizing properties of queries necessary for real-world applications. To facilitate evaluation on multiple datasets, we release standardized and improved versions of seven existing datasets and one new text-to-SQL dataset. Second, we show that the current division of data into training and test sets measures robustness to variations in the way questions are asked, but only partially tests how well systems generalize to new queries; therefore, we propose a complementary dataset split for evaluation of future work. Finally, we demonstrate how the common practice of anonymizing variables during evaluation removes an important challenge of the task. Our observations highlight key difficulties, and our methodology enables effective measurement of future development.}, url = {http://aclweb.org/anthology/P18-1033}, software = {https://github.com/jkkummerfeld/text2sql-data}, data = {https://github.com/jkkummerfeld/text2sql-data}, }
I did NLP research and software development on text simplification.
At Penn, I instructed CIS 530: Computational Linguistics (Winter, Fall 2020). At Michigan, I instructed EECS 595: Natural Language Processing (Fall 2018) and EECS 280: Programming and Introductory Data Structures (Winter, Fall 2016).
I performed software engineering, data analytics and machine learning.
GPA: 3.89/4.00
GPA: 3.82/4.00 summa cum laude
GPA: 4.23/4.30