parquet格式的文件中,有一种类型如下所示:
{
"name" : "speed",
"type" : [ "null", {
"type" : "fixed",
"name" : "decimal_11_3",
"size" : 5,
"logicalType" : "decimal",
"precision" : 11,
"scale" : 3
} ],
"default" : null,
"field-id" : 1
}
通过getString读取对应的字段,读出来是乱码,需要按照decimal格式读取出来数值。读取方法如下:
ParquetReader.Builder<Group> builder = ParquetReader.builder(new GroupReadSupport(),path);
ParquetReader<Group> reader = builder.build();
SimpleGroup group =null;
GroupType groupType = null;
while((group = (SimpleGroup)reader.read())!=null){
if(groupType == null){
groupType = group.getType();
}
StringJoiner sj = new StringJoiner(",");
for(int i=0;i<groupType.getFieldCount();i++){
String tmpName = groupType.getFieldName(i);
Type type = groupType.getType(i);
String typename = type.asPrimitiveType().getPrimitiveTypeName().name();
if (typename.equals("FIXED_LEN_BYTE_ARRAY")){
byte[] fixedFieldData = group.getBinary(i, 0).getBytes();
BigInteger bint= new BigInteger(fixedFieldData);
BigDecimal decimalValue = new BigDecimal(bint, 3);
float v = decimalValue.floatValue();
}else {
String tmp ="";
//捕获 data 为 null异常
try {
tmp = group.getValueToString(i,0);
}catch (Exception e){}
}
}
}
其他类型的:int32、int64、binary等,直接用getValueToString方法读取即可。