Unicode::Categories [![[version]](https://badge.fury.io/rb/unicode-categories.svg)](https://badge.fury.io/rb/unicode-categories) [![[ci]](https://github.com/janlelis/unicode-categories/workflows/Test/badge.svg)](https://github.com/janlelis/unicode-categories/actions?query=workflow%3ATest)

September 5, 2025 · View on GitHub

Returns a list which General Categories a Unicode string belongs to.

Unicode version: 17.0.0 (September 2025)

Gemfile

gem "unicode-categories"

Usage

require "unicode/categories"

# All general categories of a string
Unicode::Categories.categories("A 2") # => ["Lu", "Nd", "Zs"]
Unicode::Categories.categories("A 2", format: :long)
# => ["Decimal_Number", "Space_Separator", "Uppercase_Letter"]

# Also aliased as .of
Unicode::Categories.of("\u{10c50}") # => ["Cn"]

# Single character
Unicode::Categories.category("☼", format: :long) # => "Other_Symbol"

The list of categories is always sorted alphabetically.

Hints

Regex Matching

If you have a string and want to match a substring/character from a specific Unicode block, you actually won't need this gem. Instead, you can use the Regexp Unicode Property Syntax \p{}:

"Find decimal numbers (like 2 or 3) within a string".scan(/\p{Nd}+/) # => ["2", "3"]

See Idiosyncratic Ruby: Proper Unicoding for more info.

List of General Categories

You can retrieve a list of all General Categories like this:

require "unicode/categories"
puts \
  "Short | Long\n" +
  "------|-----\n" +
  Unicode::Categories.names(format: :table).to_a.map{ |r| "   %s | %s" % r }.join("\n")
ShortLong
CcControl
CfFormat
CnUnassigned
CoPrivate_Use
CsSurrogate
LCCased_Letter
LlLowercase_Letter
LmModifier_Letter
LoOther_Letter
LtTitlecase_Letter
LuUppercase_Letter
McSpacing_Mark
MeEnclosing_Mark
MnNonspacing_Mark
NdDecimal_Number
NlLetter_Number
NoOther_Number
PcConnector_Punctuation
PdDash_Punctuation
PeClose_Punctuation
PfFinal_Punctuation
PiInitial_Punctuation
PoOther_Punctuation
PsOpen_Punctuation
ScCurrency_Symbol
SkModifier_Symbol
SmMath_Symbol
SoOther_Symbol
ZlLine_Separator
ZpParagraph_Separator
ZsSpace_Separator

See unicode-x for more Unicode related micro libraries.

MIT License