text-processing.md
July 15, 2021 · View on GitHub
Bookmarks tagged [text-processing]
www.codever.land/bookmarks/t/text-processing
TankerHQ/ruplacer
https://github.com/TankerHQ/ruplacer
Find and replace text in source files
- tags: rust, text-processing
- :octocat: source code
lavifb/todo_r
https://github.com/lavifb/todo_r
Find all your TODO notes with one command!
- tags: rust, text-processing
- :octocat: source code
whitfin/runiq
https://github.com/whitfin/runiq
an efficient way to filter duplicate lines from unsorted input.
- tags: rust, text-processing
- :octocat: source code
whitfin/bytelines
https://github.com/whitfin/bytelines
Read input lines as byte slices for high efficiency.
- tags: rust, text-processing
- :octocat: source code
vishaltelangre/ff
https://github.com/vishaltelangre/ff
- tags: rust, text-processing
- :octocat: source code
BurntSushi/suffix
https://github.com/BurntSushi/suffix
Linear time suffix array construction (with Unicode support)
- tags: rust, text-processing
- :octocat: source code
BurntSushi/tabwriter
https://github.com/BurntSushi/tabwriter
Elastic tab stops (i.e., text column alignment)
- tags: rust, text-processing
- :octocat: source code
pwoolcoc/ngrams
https://github.com/pwoolcoc/ngrams
Construct n-grams from arbitrary iterators
- tags: rust, text-processing
- :octocat: source code
ps1dr3x/easy_reader
https://github.com/ps1dr3x/easy_reader
A reader that allows forwards, backwards and random navigations through the lines of huge files without consuming iterators [...
- tags: rust, text-processing
- :octocat: source code
rust-lang/regex
https://github.com/rust-lang/regex
Regular expressions (RE2 style)
- tags: rust, text-processing
- :octocat: source code
greyblake/whatlang-rs
https://github.com/greyblake/whatlang-rs
Natural language detection library based on trigrams
- tags: rust, text-processing
- :octocat: source code
yaa110/rake-rs
https://github.com/yaa110/rake-rs
Multilingual implementation of RAKE algorithm for Rust
- tags: rust, text-processing
- :octocat: source code
align
https://github.com/Guitarbum722/align
A general purpose application that aligns text.
- tags: go, text-processing
- :octocat: source code
allot
https://github.com/sbstjn/allot
Placeholder and wildcard text parsing for CLI tools and bots.
- tags: go, text-processing
- :octocat: source code
bbConvert
https://github.com/CalebQ42/bbConvert
Converts bbCode to HTML that allows you to add support for custom bbCode tags.
- tags: go, text-processing
- :octocat: source code
blackfriday
https://github.com/russross/blackfriday
Markdown processor in Go.
- tags: go, text-processing
- :octocat: source code
bluemonday
https://github.com/microcosm-cc/bluemonday
HTML Sanitizer.
- tags: go, text-processing
- :octocat: source code
codetree
https://github.com/aerogo/codetree
Parses indented code (python, pixy, scarlet, etc.) and returns a tree structure.
- tags: go, text-processing
- :octocat: source code
colly
https://github.com/asciimoo/colly
Fast and Elegant Scraping Framework for Gophers.
- tags: go, text-processing
- :octocat: source code
commonregex
https://github.com/mingrammer/commonregex
A collection of common regular expressions for Go.
- tags: go, text-processing
- :octocat: source code
dataflowkit
https://github.com/slotix/dataflowkit
Web scraping Framework to turn websites into structured data.
- tags: go, text-processing
- :octocat: source code
did
https://github.com/ockam-network/did
DID (Decentralized Identifiers) Parser and Stringer in Go.
- tags: go, text-processing
- :octocat: source code
doi
https://github.com/hscells/doi
Document object identifier (doi) parser in Go.
- tags: go, text-processing
- :octocat: source code
editorconfig-core-go
https://github.com/editorconfig/editorconfig-core-go
Editorconfig file parser and manipulator for Go.
- tags: go, text-processing
- :octocat: source code
enca
https://github.com/endeveit/enca
Minimal cgo bindings for libenca.
- tags: go, text-processing
- :octocat: source code
encdec
https://github.com/mickep76/encdec
Package provides a generic interface to encoders and decodersa.
- tags: go, text-processing
- :octocat: source code
genex
https://github.com/alixaxel/genex
Count and expand Regular Expressions into all matching Strings.
- tags: go, text-processing
- :octocat: source code
github_flavored_markdown
https://godoc.org/github.com/shurcooL/github_flavored_markdown
GitHub Flavored Markdown renderer (using blackfriday) with fenced code block highlighting, clickable header anchor links.
- tags: go, text-processing
- :octocat: source code
go-fixedwidth
https://github.com/ianlopshire/go-fixedwidth
Fixed-width text formatting (encoder/decoder with reflection).
- tags: go, text-processing
- :octocat: source code
go-humanize
https://github.com/dustin/go-humanize
Formatters for time, numbers, and memory size to human readable format.
- tags: go, text-processing
- :octocat: source code
go-nmea
https://github.com/adrianmo/go-nmea
NMEA parser library for the Go language.
- tags: go, text-processing
- :octocat: source code
go-runewidth
https://github.com/mattn/go-runewidth
Functions to get fixed width of the character or string.
- tags: go, text-processing
- :octocat: source code
go-slugify
https://github.com/mozillazg/go-slugify
Make pretty slug with multiple languages support.
- tags: go, text-processing
- :octocat: source code
go-toml
https://github.com/pelletier/go-toml
Go library for the TOML format with query support and handy cli tools.
- tags: go, text-processing
- :octocat: source code
go-vcard
https://github.com/emersion/go-vcard
Parse and format vCard.
- tags: go, text-processing
- :octocat: source code
go-zero-width
https://github.com/trubitsyn/go-zero-width
Zero-width character detection and removal for Go.
- tags: go, text-processing
- :octocat: source code
gofeed
https://github.com/mmcdole/gofeed
Parse RSS and Atom feeds in Go.
- tags: go, text-processing
- :octocat: source code
gographviz
https://github.com/awalterschulze/gographviz
Parses the Graphviz DOT language.
- tags: go, text-processing
- :octocat: source code
gommon/bytes
https://github.com/labstack/gommon/tree/master/bytes
Format bytes to string.
- tags: go, text-processing
- :octocat: source code
gonameparts
https://github.com/polera/gonameparts
Parses human names into individual name parts.
- tags: go, text-processing
- :octocat: source code
goq
https://github.com/andrewstuart/goq
Declarative unmarshaling of HTML using struct tags with jQuery syntax (uses GoQuery).
- tags: go, text-processing
- :octocat: source code
GoQuery
https://github.com/PuerkitoBio/goquery
GoQuery brings a syntax and a set of features similar to jQuery to the Go language.
- tags: go, text-processing
- :octocat: source code
goregen
https://github.com/zach-klippenstein/goregen
Library for generating random strings from regular expressions.
- tags: go, text-processing
- :octocat: source code
gotext
https://github.com/leonelquinteros/gotext
GNU gettext utilities for Go.
- tags: go, text-processing
- :octocat: source code
guesslanguage
https://github.com/endeveit/guesslanguage
Functions to determine the natural language of a unicode text.
- tags: go, text-processing
- :octocat: source code
htmlquery
https://github.com/antchfx/htmlquery
An XPath query package for HTML, lets you extract data or evaluate from HTML documents by an XPath expression.
- tags: go, text-processing
- :octocat: source code
inject
https://github.com/facebookgo/inject
Package inject provides a reflect based injector.
- tags: go, text-processing
- :octocat: source code
ltsv
https://github.com/Wing924/ltsv
High performance LTSV (Labeled Tab Separeted Value) reader for Go.
- tags: go, text-processing
- :octocat: source code
mxj
https://github.com/clbanning/mxj
Encode / decode XML as JSON or map[string]interface{}; extract values with dot-notation paths and wildcards. Replaces x2j and j2x packages.
- tags: go, text-processing
- :octocat: source code
sdp
SDP: Session Description Protocol [RFC 4566].
- tags: go, text-processing
- :octocat: source code
sh
Shell parser and formatter.
- tags: go, text-processing
- :octocat: source code
slug
https://github.com/gosimple/slug
URL-friendly slugify with multiple languages support.
- tags: go, text-processing
- :octocat: source code
Slugify
https://github.com/avelino/slugify
Go slugify application that handles string.
- tags: go, text-processing
- :octocat: source code
syndfeed
https://github.com/zhengchun/syndfeed
A syndication feed for Atom 1.0 and RSS 2.0.
- tags: go, text-processing
- :octocat: source code
toml
https://github.com/BurntSushi/toml
TOML configuration format (encoder/decoder with reflection).
- tags: go, text-processing
- :octocat: source code
gofuckyourself
https://github.com/JoshuaDoes/gofuckyourself
A sanitization-based swear filter for Go.
- tags: go, text-processing
- :octocat: source code
gotabulate
https://github.com/bndr/gotabulate
Easily pretty-print your tabular data with Go.
- tags: go, text-processing
- :octocat: source code
kace
https://github.com/codemodus/kace
Common case conversions covering common initialisms.
- tags: go, text-processing
- :octocat: source code
parseargs-go
https://github.com/nproc/parseargs-go
string argument parser that understands quotes and backslashes.
- tags: go, text-processing
- :octocat: source code
parth
https://github.com/codemodus/parth
URL path segmentation parsing.
- tags: go, text-processing
- :octocat: source code
radix
https://github.com/yourbasic/radix
fast string sorting algorithm.
- tags: go, text-processing
- :octocat: source code
TySug
https://github.com/Dynom/TySug
Alternative suggestions with respect to keyboard layouts.
- tags: go, text-processing
- :octocat: source code
xj2go
https://github.com/stackerzzq/xj2go
Convert xml or json to go struct.
- tags: go, text-processing
- :octocat: source code
xurls
https://github.com/mvdan/xurls
Extract urls from text.
- tags: go, text-processing
- :octocat: source code
chardet
https://github.com/chardet/chardet
Python 2/3 compatible character encoding detector.
- tags: python, text-processing
- :octocat: source code
difflib
https://docs.python.org/3/library/difflib.html
(Python standard library) Helpers for computing deltas.
- tags: python, text-processing
ftfy
https://github.com/LuminosoInsight/python-ftfy
Makes Unicode text less broken and more consistent automagically.
- tags: python, text-processing
- :octocat: source code
fuzzywuzzy
https://github.com/seatgeek/fuzzywuzzy
Fuzzy String Matching.
- tags: python, text-processing
- :octocat: source code
Levenshtein
https://github.com/ztane/python-Levenshtein/
Fast computation of Levenshtein distance and string similarity.
- tags: python, text-processing
- :octocat: source code
pangu.py
https://github.com/vinta/pangu.py
Paranoid text spacing.
- tags: python, text-processing
- :octocat: source code
pyfiglet
https://github.com/pwaller/pyfiglet
An implementation of figlet written in Python.
- tags: python, text-processing
- :octocat: source code
pypinyin
https://github.com/mozillazg/python-pinyin
Convert Chinese hanzi (漢字) to pinyin (拼音).
- tags: python, text-processing
- :octocat: source code
textdistance
https://github.com/orsinium/textdistance
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.
- tags: python, text-processing
- :octocat: source code
unidecode
https://pypi.python.org/pypi/Unidecode
ASCII transliterations of Unicode text.
- tags: python, text-processing
awesome-slugify
https://github.com/dimka665/awesome-slugify
A Python slugify library that can preserve unicode.
- tags: python, text-processing, slugify
- :octocat: source code
python-slugify
https://github.com/un33k/python-slugify
A Python slugify library that translates unicode to ASCII.
- tags: python, text-processing, slugify
- :octocat: source code
unicode-slugify
https://github.com/mozilla/unicode-slugify
A slugifier that generates unicode slugs with Django as a dependency.
- tags: python, text-processing, slugify
- :octocat: source code
hashids
https://github.com/davidaurelio/hashids-python
Implementation of hashids in Python.
- tags: python, text-processing, uuid
- :octocat: source code
shortuuid
https://github.com/skorokithakis/shortuuid
A generator library for concise, unambiguous and URL-safe UUIDs.
- tags: python, text-processing, uuid
- :octocat: source code
ply
Implementation of lex and yacc parsing tools for Python.
- tags: python, text-processing, parser
- :octocat: source code
pyparsing
https://github.com/pyparsing/pyparsing
A general purpose framework for generating parsers.
- tags: python, text-processing, parser
- :octocat: source code
python-nameparser
https://github.com/derek73/python-nameparser
Parsing human names into their individual components.
- tags: python, text-processing, parser
- :octocat: source code
python-phonenumbers
https://github.com/daviddrysdale/python-phonenumbers
Parsing, formatting, storing and validating international phone numbers.
- tags: python, text-processing, parser
- :octocat: source code
python-user-agents
https://github.com/selwin/python-user-agents
Browser user agent parser.
- tags: python, text-processing, parser
- :octocat: source code
sqlparse
https://github.com/andialbrecht/sqlparse
A non-validating SQL parser.
- tags: python, text-processing, parser
- :octocat: source code