python3 csv模块默认解析csv字符串或文件时,默认采用双引号来包裹字符串,采用csv模块解析时需要设置quotechar来指定用来包含特殊字符的字段。python3 csv模块对特殊字符的设置如下(详细文档请参考csv --- CSV 文件读写 — Python 3.12.1 文档):
Dialect 类支持以下属性:
Dialect.delimiter
一个用于分隔字段的单字符,默认为?','
。
Dialect.doublequote
控制出现在字段中的?引号字符?本身应如何被引出。当该属性为?True?时,双写引号字符。如果该属性为?False,则在?引号字符?的前面放置?转义符。默认值为?True。
在输出时,如果?doublequote?是?False,且?转义符?未指定,且在字段中发现?引号字符?时,会抛出?Error?异常。
Dialect.escapechar?
一个用于 writer 的单字符,用来在?quoting?设置为?QUOTE_NONE?的情况下转义?定界符,在?doublequote?设置为?False?的情况下转义?引号字符。在读取时,escapechar?去除了其后所跟字符的任何特殊含义。该属性默认为?None,表示禁用转义。
在 3.11 版更改:?不允许空的?escapechar。
Dialect.quotechar
一个单字符,用于包住含有特殊字符的字段,特殊字符如?定界符?或?引号字符?或换行符。默认为?'"'
。
在 3.11 版更改:?不允许空的?quotechar。
python3解析样例如下
import csv
import io
def parse():
data = """20231220 09:00:06,ip-172-25-1-53,quality_platform,10.10.200.31,18218,785254,QUERY,quality_platform,'ROLLBACK',0\n20231220 09:00:06,ip-172-25-1-53,quality_platform,10.10.200.31,18217,785255,QUERY,quality_platform,'INSERT INTO pangu20_user_operation (user_id, method, req_url, req_curl, result, create_time, raw_response) VALUES (\\'yuanyuan.cheng\\', \\'POST\\', \\'https://pangu2.nioint.com/remote_vehicle/vid_vin?hash_type=sha256×tamp=1703062805&app_id=100735&sign=92cc6761a21fb31066931d84914ed3863844ca900354122450de6bdfa081e857\\', \\'curl -X POST -H \\'Accept: application/json, text/plain, */*\\' -H \\'Accept-Encoding: gzip, deflate, br\\' -H \\'Accept-Language: zh-CN,zh;q=0.9\\' -H \\'Access-Control-Allow-Origin: *\\' -H \\'Connection: keep-alive\\' -H \\'Content-Length: 66\\' -H \\'Content-Type: application/json\\' -H \\'Cookie: _ga=GA1.1.1849986069.1688094826; page-gateway-sid-cn-prod=s%3ABbtpISc50xOcjrc365Z8WlZ54GmYgXIU.NVkj0neHScHIlh6mGL7rLdInAFNpEZ875KE%2FCLL0NTQ; page-gateway-sid-plm=s_BbtpISc50xOcjrc365Z8WlZ54GmYgXIU.355923d2778749c1c8961ea618beeb2dd227005369119f3be4a13f08b2f43534; mate-user-info-v1=%7B%22workNo%22%3A%2276616%22%2C%22account_id%22%3A%22489631950%22%2C%22user_name%22%3A%22yuanyuan.ch',0\n20231220 09:00:06,ip-172-25-1-53,quality_platform,10.10.200.31,18217,785256,QUERY,quality_platform,'COMMIT',0\n"""
# 使用 io.StringIO 将字符串转换为文件对象
file_obj = io.StringIO(data)
#delimiter表示字段之间的分隔符
#quotechar使用单引号表示包含特殊字符字段的字符
#由于样例csv字符串使用'\'来做转义符,doublequote需要设置为False
#escapechar使用'\'表示特殊字符串转义符号
reader = csv.reader(file_obj, delimiter=',', quotechar="'", doublequote=False, escapechar='\\')
for row in reader:
print(len(row))
print(row)
print("----------")
# reader = csv.reader(data.splitlines(), delimiter=',', quotechar="'", doublequote=False, escapechar='\\')
# for row in reader:
# print(row)
if __name__=='__main__':
parse()
执行结果如下:
10
['20231220 09:00:06', 'ip-172-25-1-53', 'quality_platform', '10.10.200.31', '18218', '785254', 'QUERY', 'quality_platform', 'ROLLBACK', '0']
----------
10
['20231220 09:00:06', 'ip-172-25-1-53', 'quality_platform', '10.10.200.31', '18217', '785255', 'QUERY', 'quality_platform', "INSERT INTO pangu20_user_operation (user_id, method, req_url, req_curl, result, create_time, raw_response) VALUES ('yuanyuan.cheng', 'POST', 'https://pangu2.nioint.com/remote_vehicle/vid_vin?hash_type=sha256×tamp=1703062805&app_id=100735&sign=92cc6761a21fb31066931d84914ed3863844ca900354122450de6bdfa081e857', 'curl -X POST -H 'Accept: application/json, text/plain, */*' -H 'Accept-Encoding: gzip, deflate, br' -H 'Accept-Language: zh-CN,zh;q=0.9' -H 'Access-Control-Allow-Origin: *' -H 'Connection: keep-alive' -H 'Content-Length: 66' -H 'Content-Type: application/json' -H 'Cookie: _ga=GA1.1.1849986069.1688094826; page-gateway-sid-cn-prod=s%3ABbtpISc50xOcjrc365Z8WlZ54GmYgXIU.NVkj0neHScHIlh6mGL7rLdInAFNpEZ875KE%2FCLL0NTQ; page-gateway-sid-plm=s_BbtpISc50xOcjrc365Z8WlZ54GmYgXIU.355923d2778749c1c8961ea618beeb2dd227005369119f3be4a13f08b2f43534; mate-user-info-v1=%7B%22workNo%22%3A%2276616%22%2C%22account_id%22%3A%22489631950%22%2C%22user_name%22%3A%22yuanyuan.ch", '0']
----------
10
['20231220 09:00:06', 'ip-172-25-1-53', 'quality_platform', '10.10.200.31', '18217', '785256', 'QUERY', 'quality_platform', 'COMMIT', '0']
----------