icu4c库中icudtl.dat文件剪裁

发布时间:2024年01月19日

背景

在工程中需要把ansi编码转utf-8,引入icu4c库,默认编译出来的.dat文件有30多M,由于仅仅需在MacOS系统下要把Windows中的ansi编码转成utf-8,需要进行裁剪。

编译icu4c工程

源码下载:https://github.com/unicode-org/icu,本文基于71.1版本编译,icu分c和java两个版本,以下都是基于c版本编译。

1.在终端更改运行icu4c/source目录

cd icu4c/source

2.给对应编译脚本提供执行权限

chmod +x runConfigureICU configure install-sh

3.在source下创建编译目录并进入

mkdir buildMacOS && cd buildMacOS

4.执行编译前的配置,编译系统目标为MacOS

../runConfigureICU MacOSX

5.编译

gnumake

编译出来的icudtl.dat文件默认为33.4MB

裁剪icudtl.dat

1.在buildMac目录下创建filters.json文件,把所有的模块都移除,剩下conversion_mappings只支持ansi编码,内容如下

{
  "featureFilters": {
// Based on the ICU63 version of
    "brkitr_dictionaries": {
      "filterType": "exclude"
    },
// # List of break iterator files (brk).
    "brkitr_rules": {
      "filterType": "exclude"
    },
// Need to explicitly add "root"
    "brkitr_tree": { "filterType": "exclude" },
    "conversion_mappings": {
      "includelist": [
// UCM_SOURCE_CORE=...
        "windows-936-2000"
      ]
    },
    "coll_tree": { "filterType": "exclude" },
    "coll_ucadata": { "filterType": "exclude" },
    "confusables": { "filterType": "exclude" },
    "curr_tree": { "filterType": "exclude" },
    "lang_tree": { "filterType": "exclude" },
    "locales_tree": { "filterType": "exclude" },
    "misc": { "filterType": "exclude" },
    "normalization": { "filterType": "exclude" },
    "rbnf_tree": { "filterType": "exclude" },
    "rbnf_index": { "filterType": "exclude" },
    "region_tree": { "filterType": "exclude" },
    "stringprep": { "filterType": "exclude" },
    "translit": { "filterType": "exclude" },
    "unames": { "filterType": "exclude" },
    "unit_tree": { "filterType": "exclude" },
    "zone_tree": { "filterType": "exclude" }
  },
  "resourceFilters": [
    {
      "categories": [
        "brkitr_tree",
        "coll_tree",
        "curr_tree",
        "lang_tree",
        "region_tree",
        "unit_tree",
        "zone_tree"
      ],
      "rules": [ "-/Version" ]
    }
  ]
}

2.安装hjson解释库,如果不想使用带注释的json格式,可以把上面的//相关行删除也行,就不需要安装hjson

pip3 install --user hjson jsonschema

3.删除icu4c/source/buildMac/data下的所有文件,其它的保留,避免其它模块重新编译,只编译data模块就好了

4.需要把filters.json文件建立一个ICU_DATA_FILTER_FILE临时环境变量

export ICU_DATA_FILTER_FILE="/Users/nickname/icu-release-71-1/icu4c/source/buildMac/filters.json"

5.重新更改编译配置

../runConfigureICU MacOSX

#最终提到输出以下信息表示filters.json文件配置成功
#Note: Applying filters from /Users/nickname/icu-release-71-1/icu4c/source/buildMac/filters.json.

6.重新编译

gnumake

7.编译成功,最终剪裁icudtl.dat文件只有133KB

查看icudtl.dat所有支持的编码方式

Available converters: 4
0  name:UTF-8  alias: 0: UTF-8 1: unicode-1-1-utf-8 2: utf8 
1  name:utf-16be  alias: 0: utf-16be 
2  name:utf-16le  alias: 0: utf-16le 1: utf-16 
3  name:windows-936-2000  alias: 0: windows-936-2000 1: GBK 2: chinese 3: iso-ir-58 4: GB2312 5: GB_2312-80 6: gb_2312 7: csGB2312 8: csiso58gb231280 9: x-gbk 

文章来源:https://blog.csdn.net/yfldyxl/article/details/135698932
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。