charCodeAt by Lua

charCodeAt by Lua

@(Lua JavaScript charCodeAt)

I wanted to have a function charCodeAt in Lua ,and it should works exactly like javascript
but with Lua5.1 ,UTF8 and Unicode are not supported,

1: how charCodeAt works in javascript

to show Console press F12 in Chrome( MAC:CMD+alt+J)

1
2
3
4
5
[
'你'.charCodeAt(0),
'ñ'.charCodeAt(0),
'n'.charCodeAt(0)
]

it will output [20320, 241, 110] ,it means the numeric value of Unicode , ‘你’=20320 , ‘ñ’=241, ‘n’=110.

The charCodeAt() method returns the numeric Unicode value of the character at the given index (except for unicode codepoints > 0x10000).

according to alexander-yakushev we can know how many bytes one UTF8 word takes using function utf8.charbytes
[https://github.com/alexander-yakushev/awesompd/blob/master/utf8.lua]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
function utf8.charbytes (s, i)
-- argument defaults
i = i or 1
local c = string.byte(s, i)
-- determine bytes needed for character, based on RFC 3629
if c > 0 and c <= 127 then
-- UTF8-1 byte
return 1
elseif c >= 194 and c <= 223 then
-- UTF8-2 byte
return 2
elseif c >= 224 and c <= 239 then
-- UTF8-3 byte
return 3
elseif c >= 240 and c <= 244 then
-- UTF8-4 byte
return 4
end
end

Unicode & UTF8 convert method

Unicode code range UTF-8 code example
hex code binary code char
0000 0000-0000 007F 0xxxxxxx n(alphabet)
0000 0000-0000 007F 110xxxxx 10xxxxxx ñ
0000 0080-0000 07FF 1110xxxx 10xxxxxx 10xxxxxx (most CJK)
0001 0000-0010 FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx other chars

but we should pay attention to 4 bytes UTF8[emoji], it works not that simple

special Method

javascript engine using UTF16,characters in Basic Multilingual Plane were the same with unicode, but if the characters were in Supplementary Plane it should use the formula below,usually we encounter Supplementary Plane emoji like 图片名称(4 byte UTF8 character)

1
2
3
-- formula 1
H = Math.floor((c-0x10000) / 0x400)+0xD800
L = (c - 0x10000) % 0x400 + 0xDC00

####

###code is here

###https://github.com/lilien1010/lua-bit

Feedback & Bug Report


Thank you for reading this , if you got any better idea, I’m glad to hear from you