How to Improve the Performance of Chinese Text Tokenization in Python and Jieba

Ng Wai Foong
Level Up Coding
Published in
4 min readJul 2, 2020

--

Adding custom words and modifying the dictionary dynamically in your Python application

Photo by Alvan Nee on Unsplash

By reading this piece, you will learn to add your own customs word to Jieba in order to improve the performance of the tokenization. When dealing with domain-specific Natural Language Processing (NLP) tasks, it is essential to have control…

--

--