Skip to main content

Research Repository

Advanced Search

Smart lexicography for low-resource languages: lessons learned from Sanskrit and Tibetan

Lugli, Ligeia

Smart lexicography for low-resource languages: lessons learned from Sanskrit and Tibetan Thumbnail


Authors

Ligeia Lugli



Contributors

Iztok Kosem
Editor

Tanara Zingano Kuhn
Editor

Margarita Correia
Editor

José Pedro Ferreira
Editor

Jansen Maarten
Editor

Isabel Pereira
Editor

Jelena Kallas
Editor

Miloš Jakubíček
Editor

Simon Krek
Editor

Carole Tiberius
Editor

Abstract

Traditional lexicography requires titanic efforts and enormous resources. For many languages, such resources have never been available. As a result, they have received only limited lexicographic coverage. Today, these languages can take advantage of many of the same digital tools and strategies that have simplified and expedited dictionary-making for mainstream languages. However, the resource gap remains evident even in the digital era, with basic corpus processing tasks that lie at the foundation of contemporary ‘smart lexicography’ still constituting a challenge for many under-resourced languages. Drawing on my own experience in Sanskrit and Tibetan lexicography, this paper aims to offer some guidance as to the advantages and limitations of the application of smart lexicography to under-resourced languages. In particular, this paper suggests that in order to optimize resources, it may be advisable to prioritize high-quality lexical annotation of the corpus over highly curated dictionary entries, and to let digital tools take care of the lexicographic representation of the annotated linguistic information.

Citation

Lugli, L. Smart lexicography for low-resource languages: lessons learned from Sanskrit and Tibetan. In I. Kosem, T. Zingano Kuhn, M. Correia, J. P. Ferreira, J. Maarten, I. Pereira, J. Kallas, M. Jakubíček, S. Krek, & C. Tiberius (Eds.), Electronic lexicography in the 21st century : Smart lexicography (198-212). Lexical Computing CZ

Online Publication Date Sep 17, 2019
Deposit Date Nov 4, 2019
Publicly Available Date Nov 4, 2019
Pages 198-212
Book Title Electronic lexicography in the 21st century : Smart lexicography
ISBN ISSN-2533-5626
Keywords automated lexicography; GDEX; Buddhist Hybrid Sanskrit; Tibetan
Publisher URL https://elex.link/elex2019/wp-content/uploads/2019/10/eLex-2019_Proceedings.pdf
Additional Information Additional Information : Proceedings of the eLex 2019 conference (Sintra, Portugal, 1–3 October 2019)

Files





Downloadable Citations