Skip to main content

Research Repository

Advanced Search

Leveraging graph algorithms to speed up the annotation of large rhymed corpora

Baley, Julien

Leveraging graph algorithms to speed up the annotation of large rhymed corpora Thumbnail


Authors



Abstract

Abstract Rhyming patterns play a crucial role in the phonological reconstruction of earlier stages of Chinese. The past few years have seen the emergence of the use of graphs to model rhyming patterns, notably with List’s (2016) proposal to use graph community detection as a way to go beyond the limits of the link-and-bind method and test new hypotheses regarding phonological reconstruction. List’s approach requires the existence of a rhyme-annotated corpus; such corpora are rare and prohibitively expensive to produce. The present paper solves this problem by introducing several strategies to automate annotation. Among others, the main contribution is the use of graph community detection itself to build an automatic annotator. This annotator requires no previous annotation, no knowledge of phonology, and automatically adapts to corpora of different periods by learning their rhyme categories. Through a series of case studies, we demonstrate the viability of the approach in quickly annotating hundreds of thousands of poems with high accuracy.

Citation

Baley, J. (2022). Leveraging graph algorithms to speed up the annotation of large rhymed corpora. Cahiers de Linguistique Asie Orientale, 51(1), 46-80. https://doi.org/10.1163/19606028-bja10019

Journal Article Type Article
Publication Date Mar 17, 2022
Deposit Date Mar 30, 2022
Publicly Available Date Mar 30, 2022
Journal Cahiers de Linguistique Asie Orientale
Print ISSN 0153-3320
Electronic ISSN 1960-6028
Publisher Brill Academic Publishers
Peer Reviewed Peer Reviewed
Volume 51
Issue 1
Pages 46-80
DOI https://doi.org/10.1163/19606028-bja10019
Keywords Linguistics and Language, Language and Linguistics

Files





You might also like



Downloadable Citations