python3 csv解析单引号包含特殊字符的字段

发布时间:2023年12月30日

python3 csv模块默认解析csv字符串或文件时,默认采用双引号来包裹字符串,采用csv模块解析时需要设置quotechar来指定用来包含特殊字符的字段。python3 csv模块对特殊字符的设置如下(详细文档请参考csv --- CSV 文件读写 — Python 3.12.1 文档):

Dialect 类支持以下属性:

Dialect.delimiter

一个用于分隔字段的单字符,默认为?','

Dialect.doublequote

控制出现在字段中的?引号字符?本身应如何被引出。当该属性为?True?时,双写引号字符。如果该属性为?False,则在?引号字符?的前面放置?转义符。默认值为?True

在输出时,如果?doublequote?是?False,且?转义符?未指定,且在字段中发现?引号字符?时,会抛出?Error?异常。

Dialect.escapechar?

一个用于 writer 的单字符,用来在?quoting?设置为?QUOTE_NONE?的情况下转义?定界符,在?doublequote?设置为?False?的情况下转义?引号字符。在读取时,escapechar?去除了其后所跟字符的任何特殊含义。该属性默认为?None,表示禁用转义。

在 3.11 版更改:?不允许空的?escapechar

Dialect.quotechar

一个单字符,用于包住含有特殊字符的字段,特殊字符如?定界符?或?引号字符?或换行符。默认为?'"'

在 3.11 版更改:?不允许空的?quotechar

python3解析样例如下

import csv
import io

def parse():
    data = """20231220 09:00:06,ip-172-25-1-53,quality_platform,10.10.200.31,18218,785254,QUERY,quality_platform,'ROLLBACK',0\n20231220 09:00:06,ip-172-25-1-53,quality_platform,10.10.200.31,18217,785255,QUERY,quality_platform,'INSERT INTO pangu20_user_operation (user_id, method, req_url, req_curl, result, create_time, raw_response) VALUES (\\'yuanyuan.cheng\\', \\'POST\\', \\'https://pangu2.nioint.com/remote_vehicle/vid_vin?hash_type=sha256&timestamp=1703062805&app_id=100735&sign=92cc6761a21fb31066931d84914ed3863844ca900354122450de6bdfa081e857\\', \\'curl -X POST -H \\'Accept: application/json, text/plain, */*\\' -H \\'Accept-Encoding: gzip, deflate, br\\' -H \\'Accept-Language: zh-CN,zh;q=0.9\\' -H \\'Access-Control-Allow-Origin: *\\' -H \\'Connection: keep-alive\\' -H \\'Content-Length: 66\\' -H \\'Content-Type: application/json\\' -H \\'Cookie: _ga=GA1.1.1849986069.1688094826; page-gateway-sid-cn-prod=s%3ABbtpISc50xOcjrc365Z8WlZ54GmYgXIU.NVkj0neHScHIlh6mGL7rLdInAFNpEZ875KE%2FCLL0NTQ; page-gateway-sid-plm=s_BbtpISc50xOcjrc365Z8WlZ54GmYgXIU.355923d2778749c1c8961ea618beeb2dd227005369119f3be4a13f08b2f43534; mate-user-info-v1=%7B%22workNo%22%3A%2276616%22%2C%22account_id%22%3A%22489631950%22%2C%22user_name%22%3A%22yuanyuan.ch',0\n20231220 09:00:06,ip-172-25-1-53,quality_platform,10.10.200.31,18217,785256,QUERY,quality_platform,'COMMIT',0\n"""
    # 使用 io.StringIO 将字符串转换为文件对象
    file_obj = io.StringIO(data)
    #delimiter表示字段之间的分隔符
    #quotechar使用单引号表示包含特殊字符字段的字符
    #由于样例csv字符串使用'\'来做转义符,doublequote需要设置为False
    #escapechar使用'\'表示特殊字符串转义符号
    reader = csv.reader(file_obj, delimiter=',', quotechar="'", doublequote=False, escapechar='\\')

    for row in reader:
        print(len(row))
        print(row)
        print("----------")
    # reader = csv.reader(data.splitlines(), delimiter=',', quotechar="'", doublequote=False, escapechar='\\')
    # for row in reader:
    #     print(row)

if __name__=='__main__':
    parse()

执行结果如下:

10
['20231220 09:00:06', 'ip-172-25-1-53', 'quality_platform', '10.10.200.31', '18218', '785254', 'QUERY', 'quality_platform', 'ROLLBACK', '0']
----------
10
['20231220 09:00:06', 'ip-172-25-1-53', 'quality_platform', '10.10.200.31', '18217', '785255', 'QUERY', 'quality_platform', "INSERT INTO pangu20_user_operation (user_id, method, req_url, req_curl, result, create_time, raw_response) VALUES ('yuanyuan.cheng', 'POST', 'https://pangu2.nioint.com/remote_vehicle/vid_vin?hash_type=sha256&timestamp=1703062805&app_id=100735&sign=92cc6761a21fb31066931d84914ed3863844ca900354122450de6bdfa081e857', 'curl -X POST -H 'Accept: application/json, text/plain, */*' -H 'Accept-Encoding: gzip, deflate, br' -H 'Accept-Language: zh-CN,zh;q=0.9' -H 'Access-Control-Allow-Origin: *' -H 'Connection: keep-alive' -H 'Content-Length: 66' -H 'Content-Type: application/json' -H 'Cookie: _ga=GA1.1.1849986069.1688094826; page-gateway-sid-cn-prod=s%3ABbtpISc50xOcjrc365Z8WlZ54GmYgXIU.NVkj0neHScHIlh6mGL7rLdInAFNpEZ875KE%2FCLL0NTQ; page-gateway-sid-plm=s_BbtpISc50xOcjrc365Z8WlZ54GmYgXIU.355923d2778749c1c8961ea618beeb2dd227005369119f3be4a13f08b2f43534; mate-user-info-v1=%7B%22workNo%22%3A%2276616%22%2C%22account_id%22%3A%22489631950%22%2C%22user_name%22%3A%22yuanyuan.ch", '0']
----------
10
['20231220 09:00:06', 'ip-172-25-1-53', 'quality_platform', '10.10.200.31', '18217', '785256', 'QUERY', 'quality_platform', 'COMMIT', '0']
----------

文章来源:https://blog.csdn.net/dongsongz/article/details/135304601
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。