PROF Nathan Hill nh36@soas.ac.uk
Professor Tibetan&Historical Linguistics
Introduction (to Special Issue on Tibetan Natural Language Processing)
Hill, Nathan W.; Di, Jiang
Authors
Jiang Di
Abstract
This introduction surveys research on Tibetan NLP, both in China and in the West, as well as contextualizing the articles contained in the special issue.
Citation
Hill, N. W., & Di, J. (2016). Introduction (to Special Issue on Tibetan Natural Language Processing). Himalayan linguistics, 15(1), 1-11. https://doi.org/10.5070/H915131516
Journal Article Type | Article |
---|---|
Acceptance Date | Jul 15, 2015 |
Publication Date | Aug 1, 2016 |
Deposit Date | Aug 18, 2016 |
Publicly Available Date | Aug 18, 2016 |
Journal | Himalayan Linguistics |
Electronic ISSN | 1544-7502 |
Peer Reviewed | Peer Reviewed |
Volume | 15 |
Issue | 1 |
Pages | 1-11 |
DOI | https://doi.org/10.5070/H915131516 |
Keywords | Tibetan, Natural Language Processing, OCR, Corpus Linguistics |
Publisher URL | https://escholarship.org/uc/item/3nm7k9xq# |
Additional Information | References : Garrett, Edward and Hill, Nathan W. (2015a) 'A Constraint Grammar POS-Tagger for Tibetan.' In: Proceedings of the Workshop on “Constraint Grammar - methods, tools and applications” at NODALIDA 2015, May 11-13, 2015. Vilnius: Institute of the Lithuanian Language, pp. 19-22. Garrett, Edward and Hill, Nathan W. and Kilgarriff, Adam and Vadlapudi, Ravikiran and Zadoks, Abel (2015) 'The contribution of corpus linguistics to lexicography and the future of Tibetan dictionaries.' Revue d'Etudes Tibétaines, 32 . pp. 51-86. Garrett, Edward and Hill, Nathan W. (2015b) 'Constituent order in the Tibetan noun phrase.' SOAS Working Papers in Linguistics, 17 . pp. 35-48. Garrett, Edward and Hill, Nathan W. and Zadoks, Abel (2014) 'A Rule-based Part-of-speech Tagger for Classical Tibetan.' Himalayan Linguistics, 13 (1). pp. 9-57. Garrett, Edward and Hill, Nathan W. and Zadoks, Abel (2013) 'Disambiguating Tibetan verb stems with matrix verbs in the indirect infinitive construction.' Bulletin of Tibetology, 49 (2). pp. 35-44. Hackett, Paul (2000a). Approaches to Tibetan Information Retrieval: Segmentation vs. n-grams. University of Maryland MLS thesis. Hackett, Paul (2000b). Automatic Segmentation and Part-Of-Speech Tagging For Tibetan. Paper presented at the Ninth Seminar of the International Association for Tibetan Studies (IATS-9), Leiden, The Netherlands, June 2000. Hackett, Paul (2003). An Entropy-based Assessment of the Tibetan Unicode Encoding. Paper presented at the Tenth Seminar of the International Association for Tibetan Studies (IATS-X), Oxford, United Kingdom, September 2003. Hackett, Paul (2010). The Use of yig-cha and chos-kyi-rnam-grangs in Computing Lexical Cohesion for Tibetan Topic Boundary Detection. Paper presented at the Twelfth Seminar of the International Association for Tibetan Studies (IATS-XII), Vancouver, British Columbia, August 2010. Hackett, Paul (2013). Digital Resources for Research and Translation of the Tibetan Buddhist Canon. Paper presented at the Thirteenth Seminar of the International Association for Tibetan Studies (IATS-XIII), Ulaanbaatar, Mongolia, July 2013. Hackett, Paul and Doug Oard (1997) Document Translation for Cross-Language Text Retrieval at the University of Maryland. Paper presented at the Sixth Text REtrieval Conference (TREC-6), Gaithersburg MD, November 1997. Hackett, Paul and Doug Oard (2000) Comparison of Word-Based and Syllable-Based Retrieval for Tibetan. Presented as a poster at the Information Retrieval for Asian Languages Workshop in Hong Kong in September, 2000. Jiang, D. 1992 Statistics on the Inflexional Phenomenon of Tibetan Verbs. Minority Languages Bimonthly.No.4. 江荻(1992):藏语动词屈折现象的统计分析,《民族语文》,第4期。 Jiang, D. 1998. An Entropy Value of Classical Tibetan Language and some Other Questions, in: Chen LW(editor) 1998 Collections of International Coference for Chinese information processing. TsingHua University Press, Beijing, 377-381. 江荻(1998):书面藏语的熵值及相关问题,黄昌宁主编:《1998年中文信息处理国际会议论文集》,第377-381页。北京:清华大学出版社。 Jiang, D. 1999. The application of Concordance Technology to Tibetan Corpus. In: Huang CN., Dong ZD.(editors) Collections on Computational Linguistics. TsingHua University Press, 359-364. Beijing. 江荻(1999):语篇索引技术在藏文文本中的应用,黄昌宁,董振东主编:《计算语言学文集》,第359-364页。北京:清华大学出版社。 Jiang, D. 2003 A New Perspective for Modern Tibetan Machine Processing and its Development: an Insight into the Method of Computerized Automatic Understanding of Natural Languages in terms of chunk identification. In: Xu B., Sun, MS, Jin, GJ (editors): Some important problems in Chinese language information processing. Science Press of China, Pp438-448, 2003. 江荻(2003):现代藏语的机器处理及发展之路,徐波,孙茂松,靳光瑾主编:《汉语自然语言处理若干重要问题》,第438-448页。北京:科学出版社。 Jiang D. 2003a. On Syntactic chunks and formal markers of Tibetan. In: Sun MS, Chen QX(editors). Language calculation and content-based text processing. Pp160-166. Tsinghua University Press, Beijing. 江荻(2003):现代藏语的句法组块与形式标记,孙茂松,陈群秀主编:《语言计算与基于内容的文本处理》,第160-166页。北京:清华大学出版社。 Jiang, D. 2003b. The method and process of the definition to syntactic chunks in modern Tibetan. Minzu Yuwen, vol4:30-39. 江荻(2003):现代藏语组块分词的方法和过程,《民族语文》,第4期,30-39。 Jiang, D. 2003c. Recognition and information extraction of finite verbs in Modern Tibetan, in: Sun MS, Yao TS, Yuan CF(editors). Advances in Computation of Oriental Languages. Pp154-160. Tsinghua University Press, Beijing. 江荻(2003):现代藏语谓语动词的识别与信息提取,Maosong Sun, Tian Shunyao, Chunfa Yuan(eds. 2003). Advances in Computation of Oriental Languages, Pp154-160. Beijing: Tsinghua University Press. Jiang, D. 2006 The history and advance in the text information processing of Tibetan Language. In: Cao, YQ., Sun MS.(edit.) Frontiers of Chinese Information Processing.83-97. Beijing: Tsinghua University Press. 江荻(2006):藏语文本信息处理的历程与进展,载:曹右琦,孙茂松(主编),中文信息处理前沿进展—中国中文信息学会二十五周年学术会议,第83-97页。北京:清华大学出版社。 Jiang, D., Dong YH 1995. Research on property of Tibetan characters as information processing, Journal of chinese information processing, Vol.9, No.2, pp37-44. 江荻,董颖红(1995):藏文信息处理属性统计研究,《中文信息学报》,第2期。 Jiang D., Kang CJ. 2004a. The Methods of Lemmatization of Bound Case forms in Modern Tibetan. 2003 IEEE International Conference on Natural Language Processing and Knowledge Engineering, IEEE Press. Jiang, D., Kang CJ. 2004b. The Sorting Mathematical Model and algorithm of Written Tibetan Language. Journal of Computer, vol.4. 江荻,康才畯(2004):书面藏语排序的数学模型及算法,《计算机学报》,第4期524-529。 Jiang, D., Long CJ. 2003 The Markers of non-finite VP of Tibetan and its automatic Recognizing strategies. Advances in Computation of Oriental Languages.169- 175. Tsinghua University Press. Jiang, D., YH. Dong 1994. A Handling to the Tibetan Characters of Spacial Construction As Linear Order. Chinese Information Processing. No.4. 江荻,董颖红(1994):藏字叠加结构线性处理统计分析,《中文信息》,第4期。 Jiang, D., Zhou JW 2001 On the Sequence of Tibetan Words and the Method of Making Sequence. Journal of Chinese information processing.vol.1,pp56-64. 江荻,周季文(2001):藏语的序性及排序方法,《中文信息学报》第1期。 Jiang, D., Long CJ., Zhang JC. 2005 The Verbal Entries and Their Description in a Grammatical Information-Dictionary of Contemporary Tibetan. In: Robert Dale, Kam-Fai Wong, Jian Su, Oi Yee Kwong (Eds), Natural Language Processing- IJCNLP2005. 874-884. springer. Jiang, Di, Dong YH 1995. Research on property of Tibetan characters as information processing, Journal of chinese information processing, Vol.9, No.2, pp37-44. 江荻,董颖红(1995):藏文信息处理属性统计研究,《中文信息学报》,第2期。 Kang, CJ, Jiang D. 2004 The Optimized Index Model of Tibetan Dictionary. Studies in Language and Linguistics, vol.1. Kang, CJ., Long, CJ., Jiang, D. 2013. Tibetan word segmentation based on word-position tagging. (2013 International Conference on Asian Language Processing, IALP. Urumqi, China, Aug 17-19) Long, CJ. 2012. Key Issues in the Text Information Processing of Tibetan Language. E-science technology and application. 2012vol.3(4): 51–58. 龙从军 2012 藏语文本信息处理的几个关键问题。《科研信息化技术与应用》第3卷第4期: 51-58。 Lu, YJ., MA SP., Zhang M., Luo G. 2003. Researches of Calculations of Tibetan Characters, Pieces, Syllables, Vocabulary and Universal Frequency and Its Applications. Journal of Northwest Minorities University. Vol. 24, No.2. 卢亚军,马少平,张敏,罗广 2003 基于大型藏文语料库的藏文字符、部件、音节、词汇频度与通用度统计及其应用研究。《西北民族大学学报》第2期。 Oard D, Dorr BJ, Hackett P, Katsova M (1998). A Comparative Study of Knowledge-Based Approaches for Cross-Language Information Retrieval. CS-TR-3897, UMIACS Tech. Report Library Wagner, Andreas and Bettina Zeisler (2004): "A syntactically annotated corpus of Tibetan." Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisboa, May 2004. Zeisler, Bettina (2004): "An annotation of what is not there: Empty arguments and cross-clausal reference in spoken and written Tibetan texts." Proceedings of the 3rd Workshop on Treebanks and Linguistic Theories, Tübingen, December 2004. |
Files
Hill and Di 2016 Introduction.pdf
(323 Kb)
PDF
Licence
http://creativecommons.org/licenses/by-nc-nd/4.0/
Publisher Licence URL
http://creativecommons.org/licenses/by-nc-nd/4.0/
You might also like
Grouping sounds into evolving units for the purpose of historical language comparison
(2024)
Journal Article
A Tibetan Passive Construction in the Old Tibetan Rāmāyaṇa
(2023)
Journal Article
Chinese Transcription of Buddhist Terms in the Late Hàn Dynasty
(2023)
Journal Article
Downloadable Citations
About SOAS Research Online
Administrator e-mail: outputs@soas.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search