Ligeia Lugli
Smart lexicography for low-resource languages: lessons learned from Sanskrit and Tibetan
Lugli, Ligeia
Authors
Contributors
Iztok Kosem
Editor
Tanara Zingano Kuhn
Editor
Margarita Correia
Editor
José Pedro Ferreira
Editor
Jansen Maarten
Editor
Isabel Pereira
Editor
Jelena Kallas
Editor
Miloš Jakubíček
Editor
Simon Krek
Editor
Carole Tiberius
Editor
Abstract
Traditional lexicography requires titanic efforts and enormous resources. For many languages, such resources have never been available. As a result, they have received only limited lexicographic coverage. Today, these languages can take advantage of many of the same digital tools and strategies that have simplified and expedited dictionary-making for mainstream languages. However, the resource gap remains evident even in the digital era, with basic corpus processing tasks that lie at the foundation of contemporary ‘smart lexicography’ still constituting a challenge for many under-resourced languages. Drawing on my own experience in Sanskrit and Tibetan lexicography, this paper aims to offer some guidance as to the advantages and limitations of the application of smart lexicography to under-resourced languages. In particular, this paper suggests that in order to optimize resources, it may be advisable to prioritize high-quality lexical annotation of the corpus over highly curated dictionary entries, and to let digital tools take care of the lexicographic representation of the annotated linguistic information.
Citation
Lugli, L. Smart lexicography for low-resource languages: lessons learned from Sanskrit and Tibetan. In I. Kosem, T. Zingano Kuhn, M. Correia, J. P. Ferreira, J. Maarten, I. Pereira, J. Kallas, M. Jakubíček, S. Krek, & C. Tiberius (Eds.), Electronic lexicography in the 21st century : Smart lexicography (198-212). Lexical Computing CZ
Online Publication Date | Sep 17, 2019 |
---|---|
Deposit Date | Nov 4, 2019 |
Publicly Available Date | Nov 4, 2019 |
Pages | 198-212 |
Book Title | Electronic lexicography in the 21st century : Smart lexicography |
ISBN | ISSN-2533-5626 |
Keywords | automated lexicography; GDEX; Buddhist Hybrid Sanskrit; Tibetan |
Publisher URL | https://elex.link/elex2019/wp-content/uploads/2019/10/eLex-2019_Proceedings.pdf |
Additional Information | Additional Information : Proceedings of the eLex 2019 conference (Sintra, Portugal, 1–3 October 2019) |
Files
eLex_2019_Lugli_SmartLexicographyForLowResourceLanguages.pdf
(306 Kb)
PDF
Licence
https://creativecommons.org/licenses/by-sa/4.0/
Publisher Licence URL
https://creativecommons.org/licenses/by-sa/4.0/
Copyright Statement
This work is licensed under the Creative Commons Attribution ShareAlike 4.0
International License.
http://creativecommons.org/licenses/by-sa/4.0/
Downloadable Citations
About SOAS Research Online
Administrator e-mail: outputs@soas.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search