这里给大家分享我在网上总结出来的一些知识,希望对大家有所帮助
最近受够了公司内部站点每次登陆都需要填写用户名和密码,还有输入验证码。
要是能够直接跳过登陆页面就好啦。
说干就干,决定使用油猴插件实现自动登陆功能。
其中最难解决的就是验证码破解,花了一天的时间完美解决,现在整理出来。
1.分析验证码
分析验证码,是破解验证码一切工作的开始。
- 验证码有哪些特征?
- 是否容易破解?
- 采用什么策略破解?
特征总结
这里仅是总结一下公司网站验证码(上面验证码图片)的特征。
- 仅有字母(大小写)和数字,并且剔除了难以区分的字符:
1
、i
、I
、l
、L
、0
、o
、O
。 - 同一字符每次出现的大小、粗细、倾斜都一致(容易做成标准的字符样本库)
- 首字符开始的位置一致(方便裁剪左侧背景)
- 有干扰线和背景色,颜色相较于字符都比较亮(方便通过阈值来区分像素是否属于字符)
制定破解策略
根据上一步分析的验证码特征来制定破解该验证码的策略。
- 制作标准样本库
- 使用标准样本对验证码图片进行卷积比对(下面会有介绍)
2.制作样本库
- 请求获取验证码
- 提取图片像素
- 二值化(将像素处理成0和1)
- 用canvas绘制二值化后的验证码(白底黑字,也可等比放大以便查看和截图)
- 从绘制的二值化后的验证码上截取合适的字符
- 处理字符截图(去白边,去噪点)
- 还原图片的放大比例(若之前有放大处理)
- 保存为模板字符串
获取验证码
// 返回图片base64数据
function getVerifyCode() {
return fetch(VERIFY_CODE_API)
.then(rsp => rsp.json())
.then(data => `data:image/png;base64,${data.data}`)
}
将base64数据转成像素
使用canvas。
// 支持base64数据或本地图片路径
async function getImageData(imageSrc) {
const image = new Image();
image.src = imageSrc;
// 等待图片加载完成
await new Promise(resolve => {
image.onload = resolve;
});
// 创建canvas
const canvas = document.createElement('canvas');
const context = canvas.getContext('2d');
context.drawImage(image, 0, 0);
return context.getImageData(0, 0, image.width, image.height);
}
返回ImageData
类型的对象。
data
是一个Uint8ClampedArray
,一个类型数组,每4位表示一个像素的rgba值(0-255)。
二值化处理
首先需设置好一个阈值,亮度高于阈值认定为背景,低于阈值暂认定为字符(有可能是噪点或干扰线)。
阈值需要根据实际效果进行调优(不断修改)。
推荐初始阈值可以设置为[130, 130, 130]
(rgb通道值,alpha固定是255就不设置了),约是0-255的中间数。
const threshold = [130, 130, 130];
// 返回每一项都是0或1的二维数组
function binarization(imageData) {
const pixel2binary = pixel =>
pixel.every((chValue, index) => chValue > threshold[index]) ? '0' : '1';
// data中每4位表示一个像素
const { data, width, height } = imageData;
const binaryData = [];
let x, y, row, rowLoc, pixel, pixelLoc;
for (y = 0; y < height; y++) {
row = [];
// 当前行起始位置
rowLoc = y * width * 4;
for (x = 0; x < width; x++) {
pixelLoc = rowLoc + x * 4;
// 取该点的rgb色值
pixel = imageData.slice(pixelLoc, 3);
row.push(pixel2binary(pixel));
}
binaryData.push(row);
}
return binaryData;
}
绘制二值化的数据(黑字白底)
function drawBinaryData(context, data, scale = 1) {
const binary2pixel = binary =>
binary === '0' ? [255, 255, 255, 255] : [0, 0, 0, 255];
const repeatAction = (action) => {
for (let i = 0; i < scale; i++) action();
};
const h = data.length;
const w = data[0].length;
let x, y, row;
cosnt pixelData = [];
for (y = 0; y < h; y++) repeatAction(() => {
for (x = 0; x < w; x++) repeatAction(() => {
pixelData.push(...binary2pixel(data[y][x]));
});
});
// 创建ImageData实例
const imageData = new ImageData(
Uint8ClampedArray.from(pixelData),
w * scale,
h * scale
);
return context.putImageData(imageData, 0, 0);
}
输出宽高都放大4倍的验证码:
截图保存样本
挑选合适的验证码将字符截图出来。
?上面验证码中的字符5就不适合作为样本,因为截取后右下方会有其它字符的点。当然也可以使用工具或写代码去除.
?将所有字符样本都保存下来。这需要不断请求获取验证码图片。
去掉字符截图白边
function cutWhiteEdge(data) {
let edge;
const isWhiteEdge = () =>
edge.every(binary => binary === '0');
// 连续切边
const cutEdgeContinuous = (resetEdge, cutEdge) => {
const _resetEdge = () => (edge = resetEdge());
for (_resetEdge(); isWhiteEdge(); cutEdge(), _resetEdge());
};
// 切边顺序:上下左右
// 上
cutEdgeContinuous(
() => data[0],
() => data.shift()
);
// 下
cutEdgeContinuous(
() => data[data.length - 1],
() => data.pop()
);
// 左
cutEdgeContinuous(
() => data.map(r => r[0]),
() => data.forEach(r => r.shift())
);
// 右
cutEdgeContinuous(
() => data.map(r => r[r.length - 1]),
() => data.forEach(r => r.pop())
);
}
还原二值化数据的缩放
function restoreDataScale(data, scale) {
const scaleData = [];
let x, y, row;
const h = data.length;
const w = data[0].length;
for (y = 0; y < h; y += scale) {
row = [];
for (x = 0; x < w; x += scale) {
row.push(data[y][x]);
}
scaleData.push(row);
}
return scaleData;
}
保存模板字符串
就是将处理后的二值化数组,转为字符串形式,方便保存(数据库等)。
function binaryData2Template(data) {
return data.map(r => r.join('')).join(' ');
}
右侧控制台打印出的就是模板字符串,不过是使用换行符进行每行的分隔。
读取字符截图
上面刚刚介绍了字符截图和处理截图,当中少了读取字符截图这一步。
可以写代码直接读取字符截图的文件夹,一次性处理所有字符截图。
我在做这一步时,是使用input[type=file]
手动每次选择一张字符截图进行处理的(时间紧张),这里贴一下代码。
fileInput.addEventListener('change', e => {
// 获取文件
if (fileInput.files.length === 0) return;
const file = fileInput.files[0];
const reader = new FileReader();
reader.addEventListener('load', async e => {
// e.target.result是图片的base64资源
const imageData = getImageData(e.target.result);
const binaryData = binarization(imageData);
cutWhiteEdge(binaryData);
// 还原之前对图片的放大
const restoreData = restoreDataScale(binaryData, 4);
const template = binaryData2Template(restoreData);
// 使用clipboard将模板写入剪切板
navigator.clipboard.writeText(template);
// 也可以发接口写入数据库...
});
reader.readAsDataURL(file);
});
FileReader的load事件
二值化阈值调整
经过多次获取验证码、二值化、然后输出查看发现,有些验证码的图片二值化后有的字符被去除了或去除了部分,原因是这些字符的颜色也比较亮。
?比如这一张验证码,打印出来是这样的(字符S亮度较高):
?此时需要调整阈值(调高一点):
const threshold = [140, 140, 140];
3.卷积比对
上面介绍了如何获取字符模板。在进行卷积比对前,需要处理和保存好所有字符的模板(这是一个辛苦活😭)。
获取模板
我这里直接使用常量定义了所有字符模板。
const CODE_TEMPLATES = {
2: '0000001111100 0000111111110 0001110000111 0001100000011 0011100000011 0000000000011 0000000000110 0000000001110 0000000001100 0000000011000 0000000110000 0000011100000 0000111000000 0001110000000 0011100000000 0111000000000 0111111111110 1111111111110',
3: '000001111000 000111111110 001110000110 001100000011 011100000011 000000000011 000000000110 000000001110 000011111000 000011111000 000000001100 000000001110 000000000110 110000000110 110000001100 111000011100 011111111000 001111100000',
4: '0000000000111 0000000001110 0000000011110 0000000111110 0000000110110 0000001101110 0000011001100 0000110001100 0001110001100 0001100001100 0011000001100 0110000011100 1111111111111 1111111111111 0000000011000 0000000011000 0000000111000 0000000111000',
5: '000111111111 000111111111 001100000000 001100000000 001100000000 001100000000 011011110000 011111111000 011100011100 000000001100 000000001110 000000001110 000000001100 110000001100 110000011100 111000111000 011111110000 001111100000',
6: '0000001111 0000111111 0001111000 0011100000 0011000000 0110000000 0110111100 1111111110 1111000111 1110000011 1100000011 1100000011 1100000011 1100000011 1100000111 1110001110 0111111100 0011111000',
7: '111111111111 111111111111 000000000110 000000000110 000000001100 000000011100 000000011000 000000110000 000000110000 000001100000 000011100000 000011000000 000111000000 000110000000 001100000000 011100000000 011000000000 111000000000',
8: '000001111100 000011111110 000111000111 001110000011 001100000011 001100000011 001100000111 001110001110 000111111100 000111111100 011100001100 011000000110 110000000110 110000000110 110000001110 111000011100 011111111000 000111110000',
9: '00001111000 00111111100 01110001110 01100000111 11100000011 11000000011 11000000011 11000000011 11100000111 01100001110 01111111110 00111100110 00000001100 00000001100 00000011000 00001110000 01111100000 01110000000',
a: '00001111100 00111111110 01110000110 01100000111 00000000111 00011111110 01111111110 11100000110 11000000110 11000001110 11000011110 11111111100 01111101110',
A: '000000000111000 000000000111000 000000001111000 000000001111000 000000011001100 000000011001100 000000110001100 000000110001100 000001100001100 000001100001110 000011000000110 000011111111110 000111111111110 001110000000110 001100000000111 011100000000011 011000000000011 111000000000011',
b: '000110000000 000110000000 001110000000 001100000000 001100000000 001100000000 001101111000 011111111110 011110001110 011100000110 011000000111 011000000111 011000000111 111000000110 111000000110 111000001110 111000011100 111111111000 110111110000',
B: '0001111111100 0011111111110 0011100000111 0011000000011 0011000000011 0011000000011 0011000000111 0111000001110 0111111111100 0111111111100 0110000001110 0110000000110 0110000000110 1110000000110 1100000001110 1100000011100 1111111111000 1111111110000',
c: '00001111100 00011111110 00111000111 01100000011 01100000011 11100000000 11000000000 11000000000 11000000000 11100000111 01100001110 01111111100 00011110000',
C: '000000111110000 000011111111100 000111100001110 000110000000110 001100000000110 001100000000111 011100000000000 011000000000000 011000000000000 011000000000000 011000000000000 111000000000000 011000000001100 011000000001100 011000000011000 001100000111000 001111111110000 000011111000000',
d: '0000000000011 0000000000011 0000000000111 0000000000110 0000000000110 0000000000110 0000111100110 0011111111110 0011100011110 0110000001100 0110000001100 1110000001100 1100000001100 1100000001100 1100000011100 1110000011000 0110000111000 0111111111000 0011111011000',
D: '00011111110000 00011111111100 00111000011110 00110000000110 00110000000111 00110000000011 00110000000011 00110000000011 01110000000011 01100000000111 01100000000110 01100000000110 01100000001110 11100000001100 11100000011100 11000001111000 11111111110000 11111111000000',
e: '00001111100 00011111110 00110000111 01100000011 01100000011 11111111111 11111111111 11000000000 11000000000 11100000000 01110000110 01111111100 00011111000',
E: '00011111111111 00011111111110 00111000000000 00111000000000 00110000000000 00110000000000 00110000000000 00110000000000 01111111111000 01111111111000 01100000000000 01100000000000 01100000000000 11100000000000 11100000000000 11000000000000 11111111111000 11111111111000',
f: '000001111 000111110 000111000 001110000 001100000 001100000 111111100 111111100 001100000 011100000 011000000 011000000 011000000 011000000 011000000 111000000 110000000 110000000 110000000',
F: '00011111111111 00011111111110 00111000000000 00111000000000 00110000000000 00110000000000 00110000000000 00110000000000 01111111111000 01111111111000 01100000000000 01100000000000 01100000000000 11100000000000 11100000000000 11000000000000 11000000000000 11000000000000',
g: '0000011110011 0001111111111 0001110001111 0011100000111 0011000000110 0111000000110 0110000000110 0110000000110 0110000001110 0111000001100 0011000011100 0011111111100 0001111101100 0000000011100 0100000011000 1110000111000 0111111110000 0011111000000',
G: '00000111111000 00001111111100 00011100001110 00110000000110 01110000000111 01100000000000 11100000000000 11000000000000 11000000000000 11000001111110 11000001111110 11000000000110 11000000001110 11000000001100 11100000001100 01110000011100 00111111111000 00011111100000',
h: '000111000000 000110000000 000110000000 000110000000 000110000000 001110000000 001110111100 001101111110 001111000111 001100000111 001100000011 011100000111 011100000110 011000000110 011000000110 011000000110 011000001110 111000001110 110000001100',
H: '0001100000000011 0001100000000011 0011100000000111 0011100000000110 0011000000000110 0011000000000110 0011000000000110 0011000000000110 0111111111111110 0111111111111100 0110000000001100 0110000000001100 0110000000001100 1110000000011100 1110000000011100 1100000000011000 1100000000011000 1100000000011000',
j: '000000110 000000111 000000110 000000000 000000000 000000110 000001110 000001110 000001100 000001100 000001100 000001100 000011100 000011000 000011000 000011000 000011000 000011000 000111000 000110000 000110000 111110000 111100000',
J: '0000000000011 0000000000011 0000000000011 0000000000011 0000000000111 0000000000110 0000000000110 0000000000110 0000000000110 0000000001110 0000000001110 0000000001100 0000000001100 1110000001100 1110000011100 0111000111000 0111111110000 0001111100000',
k: '0000110000000 0001110000000 0001100000000 0001100000000 0001100000000 0001100000000 0001100001111 0011100011100 0011000111000 0011001110000 0011011100000 0011111000000 0011111000000 0111111100000 0110001100000 0110000110000 0110000111000 0110000011000 1110000011100',
K: '0001100000001111 0001100000011100 0011100000111000 0011100001110000 0011000011100000 0011000111000000 0011001110000000 0011011100000000 0111111100000000 0111111100000000 0111101110000000 0111000110000000 0110000111000000 1110000011000000 1110000011100000 1100000001100000 1100000001110000 1100000000111000',
m: '00111011110000111100 00111111111011111110 00111000011110000110 00110000011100000111 00110000001100000111 01110000011100000110 01110000011000000110 01100000011000000110 01100000011000000110 01100000011000000110 01100000111000001110 11100000111000001100 11000000110000001100',
M: '00011100000000000111 00011100000000001111 00111100000000001111 00111100000000011110 00110110000000111110 00110110000000110110 00110110000001110110 00110110000001100110 01110111000011101110 01100011000011001100 01100011000110001100 01100011000110001100 01100011001100001100 11100011111100011100 11100001111000011000 11000001111000011000 11000001110000011000 11000001110000011000',
n: '00110111110 00111111111 01111000111 01110000011 01100000011 01100000011 01100000011 01100000111 11100000110 11000000110 11000000110 11000000110 11000001110',
N: '00011100000000111 00011100000000111 00011110000000110 00011110000000110 00011111000000110 00011011000000110 00111011100001110 00111001100001110 00110001110001100 00110000110001100 00110000111001100 00110000011001100 01110000011011100 01110000011111000 01100000001111000 01100000001111000 01100000000111000 11100000000111000',
p: '0001101111000 0001111111110 0011110001110 0011100000110 0011000000111 0011000000111 0011000000110 0111000000110 0110000000110 0110000001110 0111000011100 0111111111000 0110111110000 1110000000000 1100000000000 1100000000000 1100000000000 1100000000000',
P: '000111111111000 000111111111110 000110000000110 000110000000111 000110000000011 000110000000011 001110000000111 001110000000111 001100000001110 001111111111100 001111111110000 001100000000000 011100000000000 011100000000000 011000000000000 011000000000000 011000000000000 111000000000000',
q: '000011110011 001111111111 001110001111 011100000110 011000000110 111000000110 110000000110 110000001110 110000001110 111000001100 011000011100 011111111100 001111101100 000000001100 000000011000 000000011000 000000011000 000000011000',
Q: '00000111110000 00011111111100 00111100001110 00110000000110 01100000000110 01100000000111 11100000000111 11000000000111 11000000000111 11000000000111 11000000000110 11000000000110 11000000001110 11000000001100 11100000011100 01110000111000 01111111110000 00011111110000 00000000111000 00000000011100 00000000010000',
r: '001110111 001111111 001110000 001100000 001100000 001100000 011100000 011000000 011000000 011000000 011000000 111000000 111000000',
R: '00011111111000 00011111111100 00111000001110 00110000000110 00110000000111 00110000000111 00110000000110 01110000001110 01110000011100 01111111111000 01111111110000 01100000110000 01100000110000 11100000111000 11100000011000 11000000011000 11000000011100 11000000001100',
s: '00001111100 00111111110 01110000111 01100000011 01110000000 00111110000 00011111100 00000011110 00000000110 11000000110 11100001110 01111111100 00111110000',
S: '00000111111000 00001111111100 00011100001110 00111000000110 00110000000111 00110000000000 00110000000000 00011100000000 00001111000000 00000111110000 00000000111000 00000000001100 00000000001100 11000000001100 11000000011100 01110000111000 01111111111000 00011111100000',
t: '0001100 0001100 0001100 1111111 1111111 0011000 0011000 0011000 0011000 0111000 0110000 0110000 0110000 0110000 0111100 0011100',
T: '11111111111111 11111111111110 00000111000000 00000111000000 00000110000000 00000110000000 00000110000000 00000110000000 00001110000000 00001110000000 00001100000000 00001100000000 00001100000000 00001100000000 00011100000000 00011100000000 00011000000000 00011000000000',
u: '011100000111 011000000110 011000000110 011000000110 011000000110 011000001110 111000001100 110000001100 110000001100 110000011100 111000111100 011111111100 001111011000',
U: '000110000000011 001110000000011 001100000000111 001100000000110 001100000000110 001100000000110 011100000000110 011100000000110 011000000001110 011000000001100 011000000001100 011000000001100 011000000001100 111000000011100 011000000011000 011100001111000 001111111110000 000111111000000',
v: '11100000011 01100000111 01100000110 01100001110 01100001100 00100011100 00110011000 00110110000 00110110000 00111100000 00011100000 00011000000 00011000000',
V: '111000000000111 011000000000110 011000000001110 011000000001100 011000000011100 011100000011000 001100000111000 001100000110000 001100001110000 001100001100000 001100011100000 000110011000000 000110111000000 000110110000000 000111110000000 000111100000000 000011100000000 000011000000000',
w: '111000001100000111 011000011100000110 011000011100001100 011000111100001100 011000110100011000 011001100100011000 011001100110111000 011011000110110000 011011000110110000 011110000111100000 001110000111100000 001100000011000000 001100000011000000',
W: '111000000111000000111 111000000111000000110 011000001111000001110 011000001111000001100 011000001111000001100 011000011011000011100 011000011011000011000 011000110011000011000 011000110001000110000 011001110001100110000 011001100001101110000 011001100001101100000 011011000001101100000 011011000001111000000 011110000001111000000 001110000001111000000 001110000001110000000 001100000000110000000',
x: '0001100000111 0001110000110 0000110001100 0000111011100 0000011111000 0000011110000 0000001100000 0000011110000 0000110110000 0001110111000 0011100011000 0111000011100 1110000001100',
X: '00011100000000111 00001110000001110 00000110000011100 00000111000011000 00000011000111000 00000011101110000 00000011111100000 00000001111000000 00000001111000000 00000001110000000 00000011111000000 00000111011000000 00000110011100000 00001110001100000 00011100001110000 00111000000110000 00110000000111000 11110000000011000',
y: '0001100000011 0001100000111 0001100000110 0001100001110 0001110001100 0000110011100 0000110011000 0000110111000 0000110110000 0000111100000 0000111100000 0000011000000 0000011000000 0000110000000 0000110000000 0001100000000 1111100000000 1110000000000',
Y: '11100000000111 01100000001110 01100000001100 01110000011100 00110000111000 00110000110000 00111001110000 00011011100000 00011011000000 00011111000000 00001110000000 00001100000000 00001100000000 00001100000000 00011100000000 00011100000000 00011000000000 00011000000000',
z: '001111111111 001111111111 000000001110 000000011100 000000111000 000001110000 000011100000 000111000000 000110000000 001100000000 011000000000 111111111100 111111111100',
Z: '000111111111111 000111111111111 000000000001110 000000000001100 000000000011100 000000000011000 000000001110000 000000011100000 000000111000000 000000110000000 000001100000000 000011100000000 000111000000000 001110000000000 011100000000000 011000000000000 111111111111100 111111111111000',
};
统计字符模板中有效像素
统计字符模板中有效像素,是指统计模板中出现1
的个数(0
表示背景,无效像素)。
统计有效像素的目的是为了后面判断相似度时使用。
这一步也可以在得到模板的时候就做好,然后保存到数据库。
const tplEffectPoints = CODE_TEMPLATES.reduce((calc, code) => {
// 统计每个字符模板中1的个数
calc[code] = CODE_TEMPLATES[code].split('').filter(c => c === '1').length;
return calc;
}, {});
什么是卷积比对
我制作了一个gif示意图。卷积比对,我之前称之为扫描比对,就相当于拿着模板在图片上不停的移动(从左往右,从上往下),判断图片上的有效像素点(为1的点)是否与该字符模板的有效像素点重合度(也是相似度)。
可以想一下,为什么只判断有效像素点的重合度,而不判断非有效像素。
实现卷积比对代码
// 返回是否匹配,匹配个数,匹配位置
function convolution(binaryData, threshold = 1) {
const codes = Object.keys(CODE_TEMPLATES);
const h = binaryData.length;
const w = binaryData[0].length;
const matches = [];
let code, tplData, tplH, tplW;
function doConvolution() {
let x, y, colLastIdx, rowLastIdx;
// 返回1的个数,重合个数,重合百分比(相似度)
const compare = (x, y, code) => {
let effectivePointNum = 0;
for (let i = 0; i < tplH; i++) {
for (let j = 0; j < tplW; j++) {
if (tplData[i][j] === '1') {
if (tplData[i][j] === binaryData[i + y][j + x]) {
effectivePointNum++;
}
}
}
}
// 相似度 = 重合点数/字符模板有效点数
const similarity = effectivePointNum / tplEffectPoints[code];
return { x, y, similarity };
};
// 卷积方向:从左往右,从上往下
for (y = 0, rowLastIdx = h - tplH; y <= rowLastIdx; y++) {
for (x = 0, colLastIdx = w - tplW; x <= colLastIdx; x++) {
const result = compare(x, y, code);
if (result.similarity >= threshold) {
matches.push({ ...result, code });
}
}
}
}
for (let i = 0; i < codes.length; i++) {
code = codes[i];
// 将模板转成二维数组
tplData = CODE_TEMPLATES[code].split(' ').map(row => row.split(''));
tplH = tplData.length;
tplW = tplData[0].length;
doConvolution();
}
// 按位置(x轴)排序
matches.sort((a, b) => a.x - b.x);
return matches;
}
其它处理
在进行卷积比对前,需将验证码进行二值化处理。
二值化后的图片可能还需要进行其它处理,如去噪点、去干扰线等。
这里简单处理了一下噪点。
去噪点
噪点就是在验证码图片上随机放上一些亮度较暗的一些点,如果我们仅通过明暗这个阈值来做过滤时,很容易将噪点当做有效像素。
噪点的特征
一般来说,噪点都是随机的,不连续的.
这里简单判断一下噪点:如果一个有效点(为1的点)的周围(上下左右)不存在另一个有效点,那么就认为这个有效点是一个噪点。
function denoising(binData) {
const h = binData.length;
const w = binData[0].length;
const isEffectivePoint = (x, y) => binData[y][x] === '1';
const checkAround = (x, y) => {
// 边界控制
const checkTop = y > 0;
const checkBottom = y < h - 1;
const checkLeft = x > 0;
const checkRight = x < w - 1;
return (
(checkTop && isEffectivePoint(x, y - 1)) ||
(checkBottom && isEffectivePoint(x, y + 1)) ||
(checkLeft && isEffectivePoint(x - 1, y)) ||
(checkRight && isEffectivePoint(x + 1, y))
);
};
for (let y = 0; y < h; y++) {
for (let x = 0; x < w; x++) {
if (isEffectivePoint(x, y) && !checkAround(x, y)) {
// 将噪点置为无效点
binData[y][x] = '0';
}
}
}
}
后期处理
通过以上卷积比对拿到的结果可能并不总是满足我们的目的。
?识别上面的验证码图片,得到的匹配结果是这样的:
识别结果中数量不仅超出了4个,还额外多识别了r
。这是因为该字体的字符P
中包含了字符r
所有的有效像素。
所以,在匹配结果中,P
字符位置若识别出字符r
,我们应该舍弃字符r
。
这里列出该字体,所有有包含关系的字符:
const containMap = {
Q: { C: -1 }, // C的x比Q小1
E: { F: 0 },
V: { v: 1 },
y: { v: 2 },
m: { r: 0 },
p: { r: 0 },
};
function afterEffect(matches) {
if (matches.length <= 4) return;
// 构建数据结构,方便后续处理 {e: [match], r: [match, match], ...}
const codeMap = matches.reduce((map, item) => {
const { code } = item;
(map[code] = map[code] || []).push(item);
return map;
}, {});
Object.keys(containMap).forEach(code => {
if (!codeMap[code]) return;
Object.keys(containMap[code]).forEach(containCode => {
if (!codeMap[containCode]) return;
// 包含code与被包含code之间的位置偏差
const offest = containMap[code][containCode];
codeMap[code].forEach(Q => {
let idx = codeMap[containCode].findIndex(C => C.x === Q.x + offest);
if (idx > -1) {
// 从codeMap中移除
const [C] = codeMap[containCode].splice(0, 1);
// 从matches中移除
idx = matches.findIndex(item => item === C);
matches.splice(idx, 1);
}
});
});
});
}
后期处理可以有很多步骤(这里仅做了一步),需根据具体情况进行处理,越简单越好。
最后从匹配结果中提取验证码。
const verifyCodes = matches.map(item => item.code).join('');
还原验证
在取值验证码之前,需要再核对一次matches中的个数,如果明显不符合,那说明我们处理的还有问题。可以将每一步处理结果进行保存,后期再拿出来还原,对出问题的步骤进行优化。
另外,在我们提交验证码校验后,如果没有校验通过,也需要保存所有步骤的处理结果以及验证码,需要后续排查和优化。
校验失败后处理
会存在校验失败的情况:一种情况是我们的处理还有问题、还有可能是验证码生成步骤也会不断调整。
当识别失败后,可以允许一定次数的重试。