alsotang

alsotang

A fullstack JS programmer.

Member Since 9 years ago

Tencent, ShenZhen, China

Experience Points
5.1k
follower
Lessons Completed
76
follow
Lessons Completed
1.2k
stars
Best Reply Awards
192
repos

344 contributions in the last year

Pinned
⚡ :baby_chick:Nodeclub 是使用 Node.js 和 MongoDB 开发的社区系统
⚡ :closed_book:《Node.js 包教不包会》 by alsotang
⚡ :blue_book:async.js 各种函数的 demo
⚡ :heart_eyes: Writing Fast JavaScript
⚡ Determine if a string is all Chinese(based on unicode range)
⚡ Generate arbitrary size file on Cloudflare Workers
Activity
Oct
21
2 days ago
Activity icon
issue

alsotang issue comment alsotang/is_chinese_rs

alsotang
alsotang

Condition compile

alsotang
alsotang

node版本就不动了。这个库一般是用来判断昵称或者单字符用的,不是用来判断长文章的。

Oct
20
3 days ago
push

alsotang push alsotang/is_chinese_rs

alsotang
alsotang

Merge pull request #5 from IWANABETHATGUY/feature/simd

feat: 🎸 simd

commit sha: d976066efc15d82c5209a619bc2f71d7f360882a

push time in 3 days ago
pull request

alsotang pull request alsotang/is_chinese_rs

alsotang
alsotang

feat: 🎸 simd

main branch:

is_chinese("扁担宽,板凳长,扁担想绑在板凳上。")                                                                                              
                        time:   [38.533 ns 38.819 ns 39.152 ns]
                        change: [-54.813% -54.023% -53.260%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

is_chinese("ss扁担宽,板凳长,扁担想绑在板凳上。")                                         
                        time:   [4.8136 ns 4.8368 ns 4.8620 ns]
                        change: [+138.40% +140.61% +142.54%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

is_chinese("扁担宽,板凳长,扁担想绑在板凳上。ss")                                         
                        time:   [41.423 ns 41.788 ns 42.235 ns]
                        change: [-5.8544% -4.5882% -3.3184%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) high mild
  11 (11.00%) high severe

isChinese(chars1000) true")                                                                             
                        time:   [2.1482 us 2.1632 us 2.1816 us]
                        change: [-54.895% -54.039% -53.332%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe

is_chinese("isChinese(chars1001) false")                                                                             
                        time:   [2.1280 us 2.1559 us 2.1968 us]
                        change: [-17.167% -15.932% -14.731%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe

simd branch

is_chinese("扁担宽,板凳长,扁担想绑在板凳上。")                                         
                        time:   [52.302 ns 52.574 ns 52.884 ns]
Found 18 outliers among 100 measurements (18.00%)
  7 (7.00%) high mild
  11 (11.00%) high severe

is_chinese("ss扁担宽,板凳长,扁担想绑在板凳上。")                                         
                        time:   [4.1467 ns 4.1949 ns 4.2447 ns]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

is_chinese("扁担宽,板凳长,扁担想绑在板凳上。ss")                                         
                        time:   [4.8500 ns 4.8838 ns 4.9228 ns]
Found 10 outliers among 100 measurements (10.00%)
  8 (8.00%) high mild
  2 (2.00%) high severe

isChinese(chars1000) true")                                                                             
                        time:   [2.8905 us 2.9146 us 2.9424 us]
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe

is_chinese("isChinese(chars1001) false")                                                                            
                        time:   [100.18 ns 100.53 ns 100.94 ns]
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

这次修改使用simd 加速了扫描 可打印 ascii 的速度, 对于全是中文的情况有 30% 左右的回归,这是正常的。但是,对于有ascii 的情况下速度可以提升 10-20倍 这次实现没有像 之前的 ascii_check 一样使用 chars() 来做遍历,这是因为 chars() 由于 可打印 ascii 字符 均小于等于 0x7f, aka 01111111 同时 utf-8 中的单字节字符最大也是 0x7f, 也就是说如果我们遍历 u8 时某一个 u8 小于等于127 他一定不属于任意中文字符中,参考 image

The str type, also called a 'string slice', is the most primitive string type. It is usually seen in its borrowed form, &str. It is also the type of string literals, &'static str.

String slices are always valid UTF-8.

&str 一定是 合法的 UTF-8 字符串。

因此一种很方便的方法就是遍历 &[u8] 判断是否有任意的 u8 <= 127, 如果有任意就说明存在一个可打印 ascii 字符,这里的逻辑 与你 nodejs 的实现是保持一致的。当然这里,很自然的想到用simd 来加速了。

Oct
19
4 days ago
Activity icon
issue

alsotang issue comment alsotang/is_chinese_rs

alsotang
alsotang
alsotang
alsotang

加了

Owner user IWANABETHATGUY has been invited to be an owner of crate is_chinese

Activity icon
issue

alsotang issue comment alsotang/is_chinese_rs

alsotang
alsotang

Perf more

这一次在我mac 机器上bench ,上一个 pr 是在我的主机上测试的,所以 参考基准不一样。 master branch:

is_chinese("扁担宽,板凳长,扁担想绑在板凳上。")                                                                                             
                        time:   [83.674 ns 84.042 ns 84.506 ns]
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

                                                                                                                                              is_chinese("ss扁担宽,板凳长,扁担想绑在板凳上。")                                         
                        time:   [2.0029 ns 2.0085 ns 2.0145 ns]
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

                                                                                                                                              is_chinese("扁担宽,板凳长,扁担想绑在板凳上。ss")                                         
                        time:   [43.542 ns 43.863 ns 44.326 ns]
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

isChinese(chars1000) true")                                                                             
                        time:   [4.6331 us 4.6485 us 4.6646 us]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

is_chinese("isChinese(chars1001) false")                                                                             
                        time:   [2.5277 us 2.5390 us 2.5559 us]
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe

perf branch:

is_chinese("扁担宽,板凳长,扁担想绑在板凳上。")                                                                                              
                        time:   [38.533 ns 38.819 ns 39.152 ns]
                        change: [-54.813% -54.023% -53.260%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

                                                                                                                                              is_chinese("ss扁担宽,板凳长,扁担想绑在板凳上。")                                         
                        time:   [4.8136 ns 4.8368 ns 4.8620 ns]
                        change: [+138.40% +140.61% +142.54%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

                                                                                                                                              is_chinese("扁担宽,板凳长,扁担想绑在板凳上。ss")                                         
                        time:   [41.423 ns 41.788 ns 42.235 ns]
                        change: [-5.8544% -4.5882% -3.3184%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) high mild
  11 (11.00%) high severe

isChinese(chars1000) true")                                                                             
                        time:   [2.1482 us 2.1632 us 2.1816 us]
                        change: [-54.895% -54.039% -53.332%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe

is_chinese("isChinese(chars1001) false")                                                                             
                        time:   [2.1280 us 2.1559 us 2.1968 us]
                        change: [-17.167% -15.932% -14.731%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe

可以看到,除了 ascii 在字符串开头的情况,其均有大幅度提升,全是中文的情况下 提升大约两倍

alsotang
alsotang

你那个match的写法肯定是比多层循环要高效的,我只是觉得这种展开方式把有规律的数组变成了硬编码,从维护性上来说下降了。而这个库是否需要为了那些性能牺牲可维护性。

不过我尊重你的决定。

started
started time in 4 days ago
Activity icon
issue

alsotang issue comment alsotang/is_chinese_rs

alsotang
alsotang
alsotang
alsotang

https://crates.io/crates/is_chinese 这个我也给你个权限吧。你的 crates 账号是多少?

Oct
13
1 week ago
Oct
11
1 week ago
Activity icon
issue

alsotang issue comment alsotang/is_chinese_rs

alsotang
alsotang
alsotang
alsotang

这有点暴力吧。如果想要提前展开,我觉得应该用macros来处理。

Sep
28
3 weeks ago
Sep
22
1 month ago
pull request

alsotang pull request hq450/fancyss

alsotang
alsotang

ss_pre_stop 不应该在启动的时候执行,只应该在关闭ss的时候执行

否则会有报错。 我的场景是使用了iptables改变东西,如果开启的时候就执行,chain不存在就会报错。

push

alsotang push alsotang/fancyss

alsotang
alsotang

ss_pre_stop 不应该在启动的时候执行,只应该在关闭ss的时候执行

如题

commit sha: b7708da106b692269e6c8419b04abd31a4c6a81f

push time in 1 month ago
Sep
18
1 month ago
started
started time in 1 month ago
started
started time in 1 month ago
Sep
17
1 month ago
started
started time in 1 month ago
Sep
16
1 month ago
Sep
13
1 month ago
started
started time in 1 month ago
Sep
7
1 month ago
started
started time in 1 month ago
Sep
6
1 month ago
started
started time in 1 month ago
Sep
2
1 month ago
started
started time in 1 month ago
Aug
29
1 month ago
Aug
18
2 months ago
started
started time in 2 months ago
started
started time in 2 months ago
Aug
11
2 months ago
started
started time in 2 months ago
Previous