Unicode::Emoji [![[version]](https://badge.fury.io/rb/unicode-emoji.svg)](https://badge.fury.io/rb/unicode-emoji) [![[ci]](https://github.com/janlelis/unicode-emoji/workflows/Test/badge.svg)](https://github.com/janlelis/unicode-emoji/actions?query=workflow%3ATest)

September 9, 2025 ยท View on GitHub

Provides various sophisticated regular expressions to work with Emoji in strings, incorporating the latest Unicode / Emoji standards.

Additional features:

  • A categorized list of Emoji (RGI: Recommended for General Interchange)
  • Retrieve Emoji properties info about specific codepoints (Emoji_Modifier, Emoji_Presentation, etc.)

Emoji version: 17.0 (September 2025)

CLDR version (used for sub-region flags): 47 (March 2025)

Gemfile

gem "unicode-emoji"

Usage โ€“ Regex Matching

The gem includes multiple Emoji regexes, which are compiled out of various Emoji Unicode data sources.

require "unicode/emoji"

string = "String which contains all types of Emoji sequences:

- Basic Emoji: ๐Ÿ˜ด
- Textual Emoji with Emoji variation (VS16): โ–ถ๏ธ
- Emoji with skin tone modifier: ๐Ÿ›Œ๐Ÿฝ
- Region flag: ๐Ÿ‡ต๐Ÿ‡น
- Sub-Region flag: ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ
- Keycap sequence: 2๏ธโƒฃ
- Skin tone modifier: ๐Ÿป
- Sequence using ZWJ (zero width joiner): ๐Ÿคพ๐Ÿฝโ€โ™€๏ธ
"

string.scan(Unicode::Emoji::REGEX) # => ["๐Ÿ˜ด", "โ–ถ๏ธ", "๐Ÿ›Œ๐Ÿฝ", "๐Ÿ‡ต๐Ÿ‡น", "๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ", "2๏ธโƒฃ", "๐Ÿป", "๐Ÿคพ๐Ÿฝโ€โ™€๏ธ"]

Depending on your exact usecase, you can choose between multiple levels of Emoji detection:

Main Regexes

RegexDescriptionExample MatchesExample Non-Matches
Unicode::Emoji::REGEXUse this one if unsure! Matches (non-textual) Basic Emoji and all kinds of recommended Emoji sequences (RGI/FQE)๐Ÿ˜ด, โ–ถ๏ธ, ๐Ÿ›Œ๐Ÿฝ, ๐Ÿ‡ต๐Ÿ‡น, 2๏ธโƒฃ, ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ, ๐Ÿคพ๐Ÿฝโ€โ™€๏ธ, ๐Ÿป๐Ÿคพ๐Ÿฝโ€โ™€, ๐ŸŒโ€โ™‚๏ธ, ๐Ÿ˜ด๏ธŽ, โ–ถ, ๐Ÿ‡ต๐Ÿ‡ต, ๐Ÿด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ, ๐Ÿค โ€๐Ÿคข, 1, 1โƒฃ
Unicode::Emoji::REGEX_VALIDMatches (non-textual) Basic Emoji and all kinds of valid Emoji sequences๐Ÿ˜ด, โ–ถ๏ธ, ๐Ÿ›Œ๐Ÿฝ, ๐Ÿ‡ต๐Ÿ‡น, 2๏ธโƒฃ, ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ, ๐Ÿด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ, ๐Ÿคพ๐Ÿฝโ€โ™€๏ธ, ๐Ÿคพ๐Ÿฝโ€โ™€ ,๐ŸŒโ€โ™‚๏ธ, ๐Ÿค โ€๐Ÿคข, ๐Ÿป๐Ÿ˜ด๏ธŽ, โ–ถ, ๐Ÿ‡ต๐Ÿ‡ต, 1, 1โƒฃ
Unicode::Emoji::REGEX_WELL_FORMEDMatches (non-textual) Basic Emoji and all kinds of well-formed Emoji sequences๐Ÿ˜ด, โ–ถ๏ธ, ๐Ÿ›Œ๐Ÿฝ, ๐Ÿ‡ต๐Ÿ‡น, 2๏ธโƒฃ, ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ, ๐Ÿด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ, ๐Ÿคพ๐Ÿฝโ€โ™€๏ธ, ๐Ÿคพ๐Ÿฝโ€โ™€,๐ŸŒโ€โ™‚๏ธ , ๐Ÿค โ€๐Ÿคข, ๐Ÿ‡ต๐Ÿ‡ต, ๐Ÿป๐Ÿ˜ด๏ธŽ, โ–ถ, 1, 1โƒฃ
Unicode::Emoji::REGEX_POSSIBLEMatches all singleton Emoji, all kinds of Emoji sequences, and even non-Emoji singleton components like digits. Only exception: Unqualified keycap sequences are not matched๐Ÿ˜ด, โ–ถ๏ธ, ๐Ÿ›Œ๐Ÿฝ, ๐Ÿ‡ต๐Ÿ‡น, 2๏ธโƒฃ, ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ, ๐Ÿด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ, ๐Ÿคพ๐Ÿฝโ€โ™€๏ธ, ๐Ÿคพ๐Ÿฝโ€โ™€, ๐ŸŒโ€โ™‚๏ธ, ๐Ÿค โ€๐Ÿคข, ๐Ÿ‡ต๐Ÿ‡ต, ๐Ÿ˜ด๏ธŽ, โ–ถ, ๐Ÿป, 11โƒฃ

Include Text Emoji

By default, textual Emoji (emoji characters with text variation selector or those that have a default text presentation) will not be included in the default regexes (except in REGEX_POSSIBLE). However, if you wish to match for them too, you can include them in your regex by appending the _INCLUDE_TEXT suffix:

RegexDescriptionExample MatchesExample Non-Matches
Unicode::Emoji::REGEX_INCLUDE_TEXTREGEX + REGEX_TEXT๐Ÿ˜ด, โ–ถ๏ธ, ๐Ÿ›Œ๐Ÿฝ, ๐Ÿ‡ต๐Ÿ‡น, 2๏ธโƒฃ, ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ, ๐Ÿคพ๐Ÿฝโ€โ™€๏ธ, ๐Ÿ˜ด๏ธŽ, โ–ถ, 1โƒฃ , ๐Ÿป๐Ÿคพ๐Ÿฝโ€โ™€, ๐ŸŒโ€โ™‚๏ธ, ๐Ÿ‡ต๐Ÿ‡ต, ๐Ÿด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ, ๐Ÿค โ€๐Ÿคข, 1
Unicode::Emoji::REGEX_VALID_INCLUDE_TEXTREGEX_VALID + REGEX_TEXT๐Ÿ˜ด, โ–ถ๏ธ, ๐Ÿ›Œ๐Ÿฝ, ๐Ÿ‡ต๐Ÿ‡น, 2๏ธโƒฃ, ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ, ๐Ÿด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ, ๐Ÿคพ๐Ÿฝโ€โ™€๏ธ, ๐Ÿคพ๐Ÿฝโ€โ™€, ๐ŸŒโ€โ™‚๏ธ, ๐Ÿค โ€๐Ÿคข, ๐Ÿ˜ด๏ธŽ, โ–ถ, 1โƒฃ , ๐Ÿป๐Ÿ‡ต๐Ÿ‡ต, 1
Unicode::Emoji::REGEX_WELL_FORMED_INCLUDE_TEXTREGEX_WELL_FORMED + REGEX_TEXT๐Ÿ˜ด, โ–ถ๏ธ, ๐Ÿ›Œ๐Ÿฝ, ๐Ÿ‡ต๐Ÿ‡น, 2๏ธโƒฃ, ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ, ๐Ÿด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ, ๐Ÿคพ๐Ÿฝโ€โ™€๏ธ, ๐Ÿคพ๐Ÿฝโ€โ™€, ๐ŸŒโ€โ™‚๏ธ, ๐Ÿค โ€๐Ÿคข, ๐Ÿ‡ต๐Ÿ‡ต, ๐Ÿ˜ด๏ธŽ, โ–ถ, 1โƒฃ , ๐Ÿป1

Minimally-qualified and Unqualified Sequences

RegexDescriptionExample MatchesExample Non-Matches
Unicode::Emoji::REGEX_INCLUDE_MQELike REGEX, but additionally includes Emoji with missing Emoji Presentation Variation Selectors, where the first partial Emoji has all required Variation Selectors๐Ÿ˜ด, โ–ถ๏ธ, ๐Ÿ›Œ๐Ÿฝ, ๐Ÿ‡ต๐Ÿ‡น, 2๏ธโƒฃ, ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ, ๐Ÿคพ๐Ÿฝโ€โ™€๏ธ, ๐Ÿคพ๐Ÿฝโ€โ™€, ๐Ÿป๐ŸŒโ€โ™‚๏ธ, ๐Ÿ˜ด๏ธŽ, โ–ถ, ๐Ÿ‡ต๐Ÿ‡ต, ๐Ÿด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ, ๐Ÿค โ€๐Ÿคข, 1, 1โƒฃ
Unicode::Emoji::REGEX_INCLUDE_MQE_UQELike REGEX, but additionally includes Emoji with missing Emoji Presentation Variation Selectors๐Ÿ˜ด, โ–ถ๏ธ, ๐Ÿ›Œ๐Ÿฝ, ๐Ÿ‡ต๐Ÿ‡น, 2๏ธโƒฃ, ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ, ๐Ÿคพ๐Ÿฝโ€โ™€๏ธ, ๐Ÿคพ๐Ÿฝโ€โ™€, ๐ŸŒโ€โ™‚๏ธ, ๐Ÿป๐Ÿ˜ด๏ธŽ, โ–ถ, ๐Ÿ‡ต๐Ÿ‡ต, ๐Ÿด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ, ๐Ÿค โ€๐Ÿคข, 1, 1โƒฃ

List of MQE and UQE Emoji sequences

Singleton Regexes

Matches only simple one-codepoint (+ optional variation selector) Emoji:

RegexDescriptionExample MatchesExample Non-Matches
Unicode::Emoji::REGEX_BASICMatches (non-textual) Basic Emoji, but no sequences at all๐Ÿ˜ด, โ–ถ๏ธ, ๐Ÿป๐Ÿ˜ด๏ธŽ, โ–ถ, ๐Ÿ›Œ๐Ÿฝ, ๐Ÿ‡ต๐Ÿ‡น, ๐Ÿ‡ต๐Ÿ‡ต,2๏ธโƒฃ, ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ, ๐Ÿด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ, ๐Ÿคพ๐Ÿฝโ€โ™€๏ธ, ๐Ÿคพ๐Ÿฝโ€โ™€, ๐ŸŒโ€โ™‚๏ธ, ๐Ÿค โ€๐Ÿคข, 1
Unicode::Emoji::REGEX_TEXTMatches only textual singleton Emoji๐Ÿ˜ด๏ธŽ, โ–ถ๐Ÿ˜ด, โ–ถ๏ธ, ๐Ÿป, ๐Ÿ›Œ๐Ÿฝ, ๐Ÿ‡ต๐Ÿ‡น, ๐Ÿ‡ต๐Ÿ‡ต,2๏ธโƒฃ, ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ, ๐Ÿด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ, ๐Ÿคพ๐Ÿฝโ€โ™€๏ธ, ๐Ÿคพ๐Ÿฝโ€โ™€, ๐ŸŒโ€โ™‚๏ธ, ๐Ÿค โ€๐Ÿคข, 1

Here is a list of all Emoji that can be matched using the two regexes: character.construction/emoji-vs-text. The REGEX_BASIC regex also matches visual Emoji components (skin tone modifiers and hair components).

While REGEX_BASIC is part of the above regexes, REGEX_TEXT is only included in the *_INCLUDE_TEXT or *_UQE variants.

Comparison

  1. Fully-qualified RGI Emoji ZWJ sequence
  2. Minimally-qualified RGI Emoji ZWJ sequence (lacks Emoji Presentation Selectors, but not in the first Emoji character)
  3. Unqualified RGI Emoji ZWJ sequence (lacks Emoji Presentation Selector, including in the first Emoji character). Unqualified Emoji include all basic Emoji in Text Presentation (see column 11/12).
  4. Non-RGI Emoji ZWJ sequence
  5. Valid Region made from a pair of Regional Indicators
  6. Any Region made from a pair of Regional Indicators
  7. RGI Flag Emoji Tag Sequences (England, Scotland, Wales)
  8. Valid Flag Emoji Tag Sequences (any known subdivision)
  9. Any Emoji Tag Sequences (any tag sequence with any base)
  10. Basic Default Emoji Presentation Characters or Text characters with Emoji Presentation Selector
  11. Basic Default Text Presentation Characters or Basic Emoji with Text Presentation Selector
  12. Non-Emoji (unqualified) keycap
Regex1 RGI/FQE2 RGI/MQE3 RGI/UQE4 Non-RGI5 Valid Reยญgion6 Any Reยญgion7 RGI Tag8 Valid Tag9 Any Tag10 Basic Emoji11 Basic Text12 Text Keyยญcap
REGEXโœ…โŒโŒโŒโœ…โŒโœ…โŒโŒโœ…โŒโŒ
REGEX INCLUDE TEXTโœ…โŒโŒโŒโœ…โŒโœ…โŒโŒโœ…โœ…โœ…
REGEX INCLUDE MQEโœ…โœ…โŒโŒโœ…โŒโœ…โŒโŒโœ…โŒโŒ
REGEX INCLUDE MQE UQEโœ…โœ…โœ…โŒโœ…โŒโœ…โŒโŒโœ…โœ…โœ…
REGEX VALIDโœ…โœ…(โœ…)ยนโœ…โœ…โŒโœ…โœ…โŒโœ…โŒโŒ
REGEX VALID INCLUDE TEXTโœ…โœ…โœ…โœ…โœ…โŒโœ…โœ…โŒโœ…โœ…โœ…
REGEX WELL FORMEDโœ…โœ…(โœ…)ยนโœ…โœ…โœ…โœ…โœ…โœ…โœ…โŒโŒ
REGEX WELL FORMED INCLUDE TEXTโœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…
REGEX POSSIBLEโœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โœ…โŒ
REGEX BASICโŒโŒโŒโŒโŒโŒโŒโŒโŒโœ…โŒโŒ
REGEX TEXTโŒโŒโŒโŒโŒโŒโŒโŒโŒโŒโœ…โœ…

ยน Matches all unqualified Emoji, except for textual singleton Emoji (see columns 11, 12)

See spec files for detailed examples about which regex matches which kind of Emoji.

Picking the Right Emoji Regex

  • Usually you just want REGEX (recommended Emoji set, RGI)
  • Use REGEX_INCLUDE_MQE or REGEX_INCLUDE_MQE_UQE if you want to catch Emoji sequences with missing Variation Selectors.
  • If you want broader matching (any ZWJ sequences, more sub-region flags), choose REGEX_VALID
  • If you need to match any region flag and any tag sequence, choose REGEX_WELL_FORMED
  • Use the _INCLUDE_TEXT suffix with any of the above base regexes, if you want to also match basic textual Emoji
  • And finally, there is also the option to use REGEX_POSSIBLE, which is a simplified test for possible Emoji, comparable to REGEX_WELL_FORMED*. It might contain false positives, however, the regex is less complex and suggested in the Unicode standard itself as a first check.

Examples

DescEmojiEscapedREGEX (RGI/FQE)REGEX_INCLUDE_MQE (RGI/MQE)REGEX_VALIDREGEX_WELL_FORMED / REGEX_POSSIBLE
RGI ZWJ Sequence๐Ÿคพ๐Ÿฝโ€โ™€๏ธ\u{1F93E 1F3FD 200D 2640 FE0F}โœ…โœ…โœ…โœ…
RGI ZWJ Sequence MQE๐Ÿคพ๐Ÿฝโ€โ™€\u{1F93E 1F3FD 200D 2640}โŒโœ…โœ…โœ…
Valid ZWJ Sequence, Non-RGI๐Ÿค โ€๐Ÿคข\u{1F920 200D 1F922}โŒโŒโœ…โœ…
Known Region๐Ÿ‡ต๐Ÿ‡น\u{1F1F5 1F1F9}โœ…โœ…โœ…โœ…
Unknown Region๐Ÿ‡ต๐Ÿ‡ต\u{1F1F5 1F1F5}โŒโŒโŒโœ…
RGI Tag Sequence๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ\u{1F3F4 E0067 E0062 E0073 E0063 E0074 E007F}โœ…โœ…โœ…โœ…
Valid Tag Sequence๐Ÿด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ\u{1F3F4 E0067 E0062 E0061 E0067 E0062 E007F}โŒโŒโœ…โœ…
Well-formed Tag Sequence๐Ÿ˜ด๓ ง๓ ข๓ ก๓ ก๓ ก๓ ฟ\u{1F634 E0067 E0062 E0061 E0061 E0061 E007F}โŒโŒโŒโœ…

Please see the standard for more details, examples, explanations.

More info about valid vs. recommended Emoji can also be found in this blog article on Emojipedia.

Emoji Property Regexes

Ruby includes native regex Emoji properties, as listed in the following table. You can also opt-in to use the *_PROP_* regexes to get the Emoji support level of this gem (instead of Ruby's). Which Emoji version does Ruby support?

Gem Regex (Unicode::Emoji's Emoji support level)Native Regex (Ruby's Emoji support level)
Unicode::Emoji::REGEX_PROP_EMOJI/\p{Emoji}/
Unicode::Emoji::REGEX_PROP_MODIFIER/\p{EMod}/
Unicode::Emoji::REGEX_PROP_MODIFIER_BASE/\p{EBase}/
Unicode::Emoji::REGEX_PROP_COMPONENT/\p{EComp}/
Unicode::Emoji::REGEX_PROP_PRESENTATION/\p{EPres}/
Unicode::Emoji::REGEX_TEXT_PRESENTATION/[\p{Emoji}&&\P{EPres}]/

Extended Pictographic Regex

Unicode::Emoji::REGEX_PICTO matches single codepoints with the Extended_Pictographic property. For example, it will match โœ€ BLACK SAFETY SCISSORS.

Unicode::Emoji::REGEX_PICTO_NO_EMOJI matches single codepoints with the Extended_Pictographic property, but excludes Emoji characters.

See character.construction/picto for a list of all non-Emoji pictographic characters.

Usage โ€“ List

Use Unicode::Emoji::LIST or the list method to get a ordered and categorized list of Emoji:

Unicode::Emoji.list.keys
# => ["Smileys & Emotion", "People & Body", "Component", "Animals & Nature", "Food & Drink", "Travel & Places", "Activities", "Objects", "Symbols", "Flags"]

Unicode::Emoji.list("Food & Drink").keys
# => ["food-fruit", "food-vegetable", "food-prepared", "food-asian", "food-marine", "food-sweet", "drink", "dishware"]

Unicode::Emoji.list("Food & Drink", "food-asian")
=> ["๐Ÿฑ", "๐Ÿ˜", "๐Ÿ™", "๐Ÿš", "๐Ÿ›", "๐Ÿœ", "๐Ÿ", "๐Ÿ ", "๐Ÿข", "๐Ÿฃ", "๐Ÿค", "๐Ÿฅ", "๐Ÿฅฎ", "๐Ÿก", "๐ŸฅŸ", "๐Ÿฅ ", "๐Ÿฅก"]

Please note that categories might change with future versions of the Emoji standard, although this has not happened often.

A list of all Emoji (generated from this gem) can be found at character.construction/emoji.

Usage โ€“ Properties Data

Allows you to access the codepoint data for a single character form Unicode's emoji-data.txt file:

require "unicode/emoji"

Unicode::Emoji.properties "โ˜" # => ["Emoji", "Emoji_Modifier_Base"]

Also See

MIT