144
Stars
19
Forks
MIT
License
Go
Language
2023-05-14
Last Update
5
Open Issues
go
golang
unidecode
Related in Tokenizers
- gojieba - This is a Go implementation of jieba which a Chinese word splitting algorithm.
- gotokenizer - A tokenizer based on the dictionary and Bigram language models for Golang. (Now only support chinese segmentation)
- gse - Go efficient text segmentation; support english, chinese, japanese and other.
- MMSEGO - This is a GO implementation of MMSEG which a Chinese word splitting algorithm.
- segment - Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29
- sentences - Sentence tokenizer: converts text into a list of sentences.
- shamoji - The shamoji is word filtering package written in Go.
- stemmer - Stemmer packages for Go programming language. Includes English and German stemmers.
- textcat - Go package for n-gram based text categorization, with support for utf-8 and raw text.
- ctxi18n - Context aware i18n with a short and consise API, pluralization, interpolation, and fs.FS support. YAML locale definitions are based on Rails i18n.