Text Processing - Awesome Go
Libraries for parsing and manipulating texts.
- address - Handles address representation, validation and formatting.
- align - A general purpose application that aligns text.
- bytes - Formats and parses numeric byte values (10K, 2M, 3G, etc.).
- go-fixedwidth - Fixed-width text formatting (encoder/decoder with reflection).
- go-humanize - Formatters for time, numbers, and memory size to human readable format.
- gotabulate - Easily pretty-print your tabular data with Go.
- textwrap - Wraps text at end of lines. Implementation of textwrap module from Python.
- bafi - Universal JSON, BSON, YAML, XML translator to ANY format using templates.
- bbConvert - Converts bbCode to HTML that allows you to add support for custom bbCode tags.
- blackfriday - Markdown processor in Go.
- go-output-format - Output go structures into multiple formats (YAML/JSON/etc) in your command line app.
- go-toml - Go library for the TOML format with query support and handy cli tools.
- goldmark - A Markdown parser written in Go. Easy to extend, standard (CommonMark) compliant, well structured.
- goq - Declarative unmarshalling of HTML using struct tags with jQuery syntax (uses GoQuery).
- html-to-markdown - Convert HTML to Markdown. Even works with entire websites and can be extended through rules.
- htmlquery - An XPath query package for HTML, lets you extract data or evaluate from HTML documents by an XPath expression.
- htmlyaml - Rich rendering of YAML as HTML in Go
- htree - Traverse, navigate, filter, and otherwise process trees of html.Node objects.
- mxj - Encode / decode XML as JSON or map[string]interface{}; extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages.
- toml - TOML configuration format (encoder/decoder with reflection).
- allot - Placeholder and wildcard text parsing for CLI tools and bots.
- codetree - Parses indented code (python, pixy, scarlet, etc.) and returns a tree structure.
- commonregex - A collection of common regular expressions for Go.
- did - DID (Decentralized Identifiers) Parser and Stringer in Go.
- doi - Document object identifier (doi) parser in Go.
- editorconfig-core-go - Editorconfig file parser and manipulator for Go.
- encdec - Package provides a generic interface to encoders and decoders.
- go-fasttld - High performance effective top level domains (eTLD) extraction module.
- go-nmea - NMEA parser library for the Go language.
- go-querystring - Go library for encoding structs into URL query parameters.
- go-vcard - Parse and format vCard.
- godump - Pretty print any GO variable with ease, an alternative to Go's fmt.Printf("%#v").
- gofeed - Parse RSS and Atom feeds in Go.
- gographviz - Parses the Graphviz DOT language.
- gonameparts - Parses human names into individual name parts.
- ltsv - High performance LTSV (Labeled Tab Separated Value) reader for Go.
- normalize - Sanitize, normalize and compare fuzzy text.
- parseargs-go - string argument parser that understands quotes and backslashes.
- parth - URL path segmentation parsing.
- prattle - Scan and parse LL(1) grammars simply and efficiently.
- sdp - SDP: Session Description Protocol [RFC 4566].
- sh - Shell parser and formatter.
- tokenizer - Parse any string, slice or infinite buffer to any tokens.
- vdf - A Lexer and Parser for Valves Data Format (known as vdf) written in Go.
- when - Natural EN and RU language date/time parser with pluggable rules.
- xj2go - Convert xml or json to go struct.
- genex - Count and expand Regular Expressions into all matching Strings.
- go-wildcard - Simple and lightweight wildcard pattern matching.
- goregen - Library for generating random strings from regular expressions.
- regroup - Match regex expression named groups into go struct using struct tags and automatic parsing.
- rex - Regular expressions builder.
- bluemonday - HTML Sanitizer.
- gofuckyourself - A sanitization-based swear filter for Go.
- colly - Fast and Elegant Scraping Framework for Gophers.
- dataflowkit - Web scraping Framework to turn websites into structured data.
- go-recipe - A package for scraping recipes from websites.
- GoQuery - GoQuery brings a syntax and a set of features similar to jQuery to the Go language.
- pagser - Pagser is a simple, extensible, configurable parse and deserialize html page to struct based on goquery and struct tags for golang crawler.
- Tagify - Produces a set of tags from given source.
- walker - Seamlessly fetch paginated data from any source. Simple and high performance API scraping included.
- xurls - Extract urls from text.
- podcast - iTunes Compliant and RSS 2.0 Podcast Generator in Golang
- go-runewidth - Functions to get fixed width of the character or string.
- go-zero-width - Zero-width character detection and removal for Go.
- kace - Common case conversions covering common initialisms.
- petrovich - Petrovich is the library which inflects Russian names to given grammatical case.
- radix - Fast string sorting algorithm.
- TySug - Alternative suggestions with respect to keyboard layouts.
- w2vgrep - A semantic grep tool using word embeddings to find semantically similar matches. For example, searching for "death" will find "dead", "killing", "murder".