Skip to main content

Research Repository

Advanced Search

Printed Text Recognition for Lexical Lists in Chinese- International Phonetic Alphabet (IPA) Glossing

Hill, Nathan W.; Li, Shihua

Printed Text Recognition for Lexical Lists in Chinese- International Phonetic Alphabet (IPA) Glossing Thumbnail


Authors

Shihua Li



Abstract

This study presents a dataset serving as a benchmark for the recognition of printed text in lexical lists using Chinese-IPA glossing. The paper provides an overview of the baseline model, transcription model, and PyLaia engines employed in the research. Furthermore, it elucidates the specific need for digitizing the aforementioned lexical lists, outlines the methodology employed for training the baseline model for layout analysis, and describes the training process of the transcription model using the ground truth data generated on Transkribus. This comprehensive approach encompasses both the images of the lexical list content and their corresponding transcriptions as input. Additionally, the study highlights the limitations of the model and identifies avenues for future development. By making this dataset openly accessible, it can be utilized by researchers seeking to digitize lexical lists using Chinese-IPA glossing. Moreover, since the model can recognize both Chinese characters and IPA symbols, it has the potential to contribute to linguistic analysis of languages documented in Chinese-IPA glossing.

Citation

Hill, N. W., & Li, S. (2023). Printed Text Recognition for Lexical Lists in Chinese- International Phonetic Alphabet (IPA) Glossing. Journal of Open Humanities Data, 9(15), 1-8. https://doi.org/10.5334/johd.119

Journal Article Type Article
Acceptance Date Oct 1, 2023
Publication Date Jul 21, 2023
Deposit Date Oct 25, 2023
Publicly Available Date Oct 25, 2023
Journal Journal of Open Humanities Data
Electronic ISSN 2059-481X
Publisher Ubiquity Press
Peer Reviewed Peer Reviewed
Volume 9
Issue 15
Pages 1-8
DOI https://doi.org/10.5334/johd.119
Keywords printed text recognition, Chinese, IPA, Burmish and Tujia languages, lexical lists, baseline model, transcription model, Transkribus
Publisher URL https://openhumanitiesdata.metajnl.com/articles/10.5334/johd.119

Files





You might also like



Downloadable Citations