Crystal detects Emoji symbols in String
Problem is to identify unicode characters that has different byte size, but single symbol in render. Such of the symbols Emoji.
subject = "🇺🇦 Ukraine"
puts subject.size
> 10
puts subject.bytesize
> 16
puts subject.chars
> ['🇺', '🇦', ' ', 'U', 'k', 'r', 'a', 'i', 'n', 'e']
To help developers to skip building a big Regexp² to detect characters, introduced String::Grapheme¹.
puts subject.grapheme_size
> 9
puts subject.graphemes
> [String::Grapheme("🇺🇦"), String::Grapheme(' '), String::Grapheme('U'), String::Grapheme('k'), String::Grapheme('r'), String::Grapheme('a'), String::Grapheme('i'), String::Grapheme('n'), String::Grapheme('e')]
The result shows exactly the number of symbols to be rendered.
Example how Grapheme could be used. Here is original code:
result = ""
subject.each_char_with_index do |c, index|
result += "<" if index == 2
result += c
result += ">" if index == 8
end
puts result
> 🇺🇦< Ukrain>e
and it converted to something very similar
index = 0
result = ""
subject.each_grapheme do |symbol|
result += "<" if index == 2
result += symbol.to_s
result += ">" if index == 8
index += 1
end
puts result
> 🇺🇦 <Ukraine>
References
https://crystal-lang.org/api/1.3.2/String/Grapheme.html