2620
Stars
306
Forks
MIT
License
Go
Language
2025-11-27
Last Update
9
Open Issues
Related in Tokenizers
- gotokenizer - A tokenizer based on the dictionary and Bigram language models for Golang. (Now only support chinese segmentation)
- gse - Go efficient text segmentation; support english, chinese, japanese and other.
- MMSEGO - This is a GO implementation of MMSEG which a Chinese word splitting algorithm.
- segment - Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29
- sentences - Sentence tokenizer: converts text into a list of sentences.
- shamoji - The shamoji is word filtering package written in Go.
- stemmer - Stemmer packages for Go programming language. Includes English and German stemmers.
- textcat - Go package for n-gram based text categorization, with support for utf-8 and raw text.
- ctxi18n - Context aware i18n with a short and consise API, pluralization, interpolation, and fs.FS support. YAML locale definitions are based on Rails i18n.
- go-i18n - Package and an accompanying tool to work with localized text.