MR Julien Baley jb130@soas.ac.uk
Postdoctoral Researcher
Leveraging graph algorithms to speed up the annotation of large rhymed corpora
Baley, Julien
Authors
Abstract
Abstract Rhyming patterns play a crucial role in the phonological reconstruction of earlier stages of Chinese. The past few years have seen the emergence of the use of graphs to model rhyming patterns, notably with List’s (2016) proposal to use graph community detection as a way to go beyond the limits of the link-and-bind method and test new hypotheses regarding phonological reconstruction. List’s approach requires the existence of a rhyme-annotated corpus; such corpora are rare and prohibitively expensive to produce. The present paper solves this problem by introducing several strategies to automate annotation. Among others, the main contribution is the use of graph community detection itself to build an automatic annotator. This annotator requires no previous annotation, no knowledge of phonology, and automatically adapts to corpora of different periods by learning their rhyme categories. Through a series of case studies, we demonstrate the viability of the approach in quickly annotating hundreds of thousands of poems with high accuracy.
Citation
Baley, J. (2022). Leveraging graph algorithms to speed up the annotation of large rhymed corpora. Cahiers de Linguistique Asie Orientale, 51(1), 46-80. https://doi.org/10.1163/19606028-bja10019
Journal Article Type | Article |
---|---|
Publication Date | Mar 17, 2022 |
Deposit Date | Mar 30, 2022 |
Publicly Available Date | Mar 30, 2022 |
Journal | Cahiers de Linguistique Asie Orientale |
Print ISSN | 0153-3320 |
Electronic ISSN | 1960-6028 |
Publisher | Brill Academic Publishers |
Peer Reviewed | Peer Reviewed |
Volume | 51 |
Issue | 1 |
Pages | 46-80 |
DOI | https://doi.org/10.1163/19606028-bja10019 |
Keywords | Linguistics and Language, Language and Linguistics |
Files
[19606028 - Cahiers de Linguistique Asie Orientale] Leveraging graph algorithms to speed up the annotation of large rhymed corpora.pdf
(2.8 Mb)
PDF
Licence
http://creativecommons.org/licenses/by/4.0/
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
You might also like
Evaluating Rhyme Annotations for Large Corpora: Metrics and Data
(2023)
Journal Article
Chinese Transcription of Buddhist Terms in the Late Hàn Dynasty
(2023)
Journal Article
Downloadable Citations
About SOAS Research Online
Administrator e-mail: outputs@soas.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search