爬虫三天快速入门

发布时间:2024年01月21日

Day01:爬虫核心

1. HTTP协议与WEB开发

1. 什么是请求头请求体,响应头响应体
2. URL地址包括什么
3. get请求和post请求到底是什么
4. Content-Type是什么

(1)简介

HTTP协议是Hyper Text Transfer Protocol(超文本传输协议)的缩写,是用于万维网(WWW:World Wide Web )服务器与本地浏览器之间传输超文本的传送协议。HTTP是一个属于应用层的面向对象的协议,由于其简捷、快速的方式,适用于分布式超媒体信息系统。它于1990年提出,经过几年的使用与发展,得到不断地完善和扩展。HTTP协议工作于客户端-服务端架构为上。浏览器作为HTTP客户端通过URL向HTTP服务端即WEB服务器发送所有请求。Web服务器根据接收到的请求后,向客户端发送响应信息。

(2)请求协议与响应协议

http协议包含由浏览器发送数据到服务器需要遵循的请求协议与服务器发送数据到浏览器需要遵循的请求协议。用于HTTP协议交互的信被为HTTP报文。请求端(客户端)的HTTP报文 做请求报文,响应端(服务器端)的 做响应报文。HTTP报文本身是由多行数据构成的字文本。

一个完整的URL包括:协议、ip、端口、路径、参数

例如: 百度安全验证 其中https是协议,www.baidu.com?是IP,端口默认80,/s是路径,参数是wd=yuan

请求方式: get与post请求

  • GET提交的数据会放在URL之后,以?分割URL和传输数据,参数之间以&相连,如EditBook?name=test1&id=123456. POST方法是把提交的数据放在HTTP包的请求体中.

  • GET提交的数据大小有限制(因为浏览器对URL的长度有限制),而POST方法提交的数据没有限制

响应状态码:状态码的职 是当客户端向服务器端发送请求时, 返回的请求 结果。借助状态码,用户可以知道服务器端是正常 理了请求,还是出 现了 。状态码如200 OK,以3位数字和原因组成。

2. requests&反爬破解

(1)UA反爬

import requests
?
headers = {
 ? ?"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36",
}
?
res = requests.get(
 ? ?"https://www.baidu.com/",
 ? ?# headers=headers
)
?
# 解析数据
with open("baidu.html", "w") as f:
 ? ?f.write(res.text)
?

(2)referer反爬

import requests
?
headers = {
 ? ?"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36",
 ? ?"Referer": "https://movie.douban.com/explore",
}
?
res = requests.get(
 ? ?"https://m.douban.com/rexxar/api/v2/movie/recommend?refresh=0&start=0&count=20&selected_categories=%7B%7D&uncollect=false&tags=",
 ? ?headers=headers
)
?
# 解析数据
print(res.text)
?

(3)cookie反爬

import requests
url = "https://stock.xueqiu.com/v5/stock/screener/quote/list.json?page=1&size=30&order=desc&orderby=percent&order_by=percent&market=CN&type=sh_sz"
cookie = 'xq_a_token=a0f5e0d91bc0846f43452e89ae79e08167c42068; xqat=a0f5e0d91bc0846f43452e89ae79e08167c42068; xq_r_token=76ed99965d5bffa08531a6a47501f096f61108e8; xq_id_token=eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJ1aWQiOi0xLCJpc3MiOiJ1YyIsImV4cCI6MTY5NTUxNTc5NCwiY3RtIjoxNjkzMjAzODIzMzAwLCJjaWQiOiJkOWQwbjRBWnVwIn0.MCIGGTGaSPe9nVuXkyrXQTlCthdURSnDtqm8dGttO2XYHeaMPSKmHQvsJmbw3OJTRnkf0KHZvgF0W3Rv-9uYe4P2Wizt0g2QzQonONjUmExABmZX0e3ara8BzBQ3b96H7dm0LV4pdBlnOW0A9PUmGRouWM7kVUOGPvd3X7GkB7M_th8pV8SZo9Iz4nzjrwQzxPBa0DlS7whbeNeXMnbnmAPp7z-eG75vdE2Pb3OyZ5Gv-FINhpQtAWo95lTxZVw5C5VHSzbR_-z8uqH6DD0xop4_wvKw5LIVwu6ZZ6TUnNFr3zGU9jWqAGgdzcKgO38dlL6uXNixa9mrKOd1OZnDig; cookiesu=431693203848858; u=431693203848858; Hm_lvt_1db88642e346389874251b5a1eded6e3=1693203851; device_id=7971eba10048692a91d87e3dad9eb9ca; s=bv11kb1wna; Hm_lpvt_1db88642e346389874251b5a1eded6e3=1693203857'
headers = {
 ? ?'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36',
 ? ?"referer": "https://xueqiu.com/",
 ? ?"cookie": cookie,
?
}
res = requests.get(url, headers=headers)
print(res.text)

3. 请求参数

(1)get请求以及查询参数

import requests
?
headers = {
 ? ?"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36",
 ? ?"Referer": "https://movie.douban.com/explore",
}
?
res = requests.get(
 ? ?"https://m.douban.com/rexxar/api/v2/movie/recommend?refresh=0&start=0&count=20&selected_categories=%7B%7D&uncollect=false&tags=",
 ? ?headers=headers,
 ? ?# params={  # 查询
 ? ?# ?  "count": "20",
 ? ?# ?  "tags": "悬疑"
 ? ?# }
)
?
# 解析数据
print(res.text)

(2)post请求以及请求体参数

import requests
?
while 1:
 ? ?wd = input("请输入翻译内容:")
?
 ? ?res = requests.post("https://aidemo.youdao.com/trans?", params={}, headers={},
 ? ? ? ? ? ? ? ? ? ? ? ?data={
 ? ? ? ? ? ? ? ? ? ? ? ? ? ?"q": wd,
 ? ? ? ? ? ? ? ? ? ? ? ? ? ?"from": "Auto",
 ? ? ? ? ? ? ? ? ? ? ? ? ? ?"to": "Auto"
 ? ? ? ? ? ? ? ? ? ? ?  })
?
 ? ?print(res.json().get("translation")[0])
?

4. 爬虫图片和视频

(1)直接爬取媒体数据流

import requests
?
?
# (1)下载图片
url = "https://pic.netbian.com/uploads/allimg/230812/202108-16918428684ab5.jpg"
?
res = requests.get(url)
?
# 解析数据
with open("a.jpg", "wb") as f:
 ? ?f.write(res.content)
?
# (2)下载视频
?
url = "https://vd3.bdstatic.com/mda-nadbjpk0hnxwyndu/720p/h264_delogo/1642148105214867253/mda-nadbjpk0hnxwyndu.mp4?v_from_s=hkapp-haokan-hbe&auth_key=1693223039-0-0-e2da819f15bfb93409ce23540f3b10fa&bcevod_channel=searchbox_feed&pd=1&cr=2&cd=0&pt=3&logid=2639522172&vid=5423681428712102654&klogid=2639522172&abtest=112162_5"
?
res = requests.get(url)
?
# 解析数据
with open("美女.mp4", "wb") as f:
 ? ?f.write(res.content)

(2)批量爬取数据

import requests
import re
import os
?
# (1)获取当页所有的img url
start_url = "https://pic.netbian.com/4kmeinv/"
res = requests.get(start_url)
img_url_list = re.findall("uploads/allimg/.*?.jpg", res.text)
print(img_url_list)
?
# (2)循环下载所有图片
for img_url in img_url_list:
 ? ?res = requests.get("https://pic.netbian.com/" + img_url)
 ? ?img_name = os.path.basename(img_url)
 ? ?with open(img_name, "wb") as f:
 ? ? ? ?f.write(res.content)
 ? ? ? ?

5. 打码平台

获取验证码

打码平台:图鉴

import base64
import json
import requests
?
?
def base64_api(uname, pwd, img, typeid):
 ? ?with open(img, 'rb') as f:
 ? ? ? ?base64_data = base64.b64encode(f.read())
 ? ? ? ?b64 = base64_data.decode()
 ? ?data = {"username": uname, "password": pwd, "typeid": typeid, "image": b64}
 ? ?result = json.loads(requests.post("http://api.ttshitu.com/predict", json=data).text)
 ? ?if result['success']:
 ? ? ? ?return result["data"]["result"]
 ? ?else:
 ? ? ? ?# !!!!!!!注意:返回 人工不足等 错误情况 请加逻辑处理防止脚本卡死 继续重新 识别
 ? ? ? ?return result["message"]
?
?
if __name__ == "__main__":` ? ?img_path = "./v_code.jpg"
 ? ?result = base64_api(uname='yuan0316', pwd='yuan0316', img=img_path, typeid=3)
 ? ?print(result)

6. 今日作业

动手练习 :抖音短视频

https://www.douyin.com/user/MS4wLjABAAAAMbqnWxzUfZegt9vrNBDz7zyqwhvG6vXiKTDxVm2wUD0

Day02:JS高级逆向实战:某某翻译

【1】抓包分析

通过点击翻译按钮,触发相应的ajax请求,通过对响应的分析,webtranslate是我们的目标URL,接下来判断哪一个数据是逆向值

再点击一次翻译,重新打开新的webtranslate请求,对比两次的请求体数据,哪些是变化的,即我们的逆向值。

通过对比,我们发现变化的有两个,第一个是sign值,第二个是mysticTime,这两个值每次发请求都不一样,如果直接copy使用,很有可能会失败,所以需要找到他们的生成位置,实现完整模拟。

【2】入口定位

那么本地关于这个网站的JS代码那么多,怎么能快速定位到逆向值生成位置就是整个逆向的关键。今天先交给大家逆向干货第一招:关键字搜索,比如sign值,我们思考,应该会有个函数最终返回该值,那么接下来呢,大概率应该要赋值给键sign,最后组装到请求体对象中,所以我们可以猜测,在sign关键字附近很大概率可能有sign的生成函数,当然这一招不是万能的,没有一招是屡试不爽的,都是综合分析应用。所以我们进行关键字搜索sign

通过搜索,我们发现sign的结果相当多(一般三五个比较理想),这时就不太方便进行确认具体哪一个是具体目标了。

所以,sign关键字不友好,那就换mysticTime,因为mysticTime的键值和sign的键值很近。

相对少一些,可以试着看一看,结果发现一个w函数包含mysticTime的键值和sign的键值,且sign值是一个函数,很像是是我们的目标,那么如何确认呢?

干货第二招:加断点确认“嫌疑犯”,在sign值行打上断点,再次点击翻译按钮,因为,点击事件一定会经过真正的构建sign值位置,如果“嫌疑位置”是目标入口,那么必然会被断点断住,如果没有断住,就一定是“被冤枉了”,不是我们的目标URL。结果如下:

sign值行变绿色,上方出现断点调试,说明该位置就是sign的生成入口,接下来鼠标悬浮到k位置:

悬浮弹出的蓝色链接app.3b85caff.js:1就是k函数的位置,点击即可直接进入k函数中。同时,注意k函数调用的两个实参oe的值,上面代码,const o = (new Date).getTime();,o值即当前时间戳,下面的mysticTime: o,所以mysticTime也就是当前时间戳。那么e??的值说多少呢,其实在爬虫过程中经常要检测变量值,最简单的方式就是,在控制台直接打印:

这里一定一定要注意,一定是断点在哪个函数,打印该函数的位置,千万不要出现断点在A函数,打印B函数的变量,因为两个函数可能有相同变量名,但是值不同,由此造成的混淆。

好了接下来我们直接定位k函数:

其实就在w函数的上面。

【3】逆向

现在我们已经找到创建sign值的位置,接下来就是将这套生成策略通过我们的代码实现,这里有两种思路:

  • 将js的生成逻辑转为python实现

  • 将js的代码直接拷贝到本地执行,实现JS逆向

(1)Python逆向实现

通过k函数和j函数的阅读理解,首先通过几个变量组成一个字符串,然后对该字符串生成md5值,本身md5算法是固定参数生成固定值,但是因为参数中有个时间戳,导致sign每次都会发生变化,所以Python的逆向实现:

import requests
import time
import hashlib
?
def get_md5(val, is_hex=True):
 ? ?md5 = hashlib.md5()
 ? ?md5.update(val.encode())
 ? ?if is_hex:
 ? ? ? ?return md5.hexdigest()
 ? ?else:
 ? ? ? ?return md5.digest()
?
url = "https://dict.xxx.com/webtranslate"
?
# (1)构建逆向动态值
mysticTime = str(int(time.time() * 1000))
?
d = 'fanyideskweb'
e = mysticTime
u = 'webfanyi'
t = 'fsdsogkndfokasodnaso'
s = f"client={d}&mysticTime={e}&product={u}&key={t}"
sign = get_md5(s)
print("sign:::",sign)
?
# (2)请求模拟
data = {
 ? ?"i": "apple",
 ? ?"from": "auto",
 ? ?"to": "",
 ? ?"dictResult": "true",
 ? ?"keyid": "webfanyi",
 ? ?"sign": sign,
 ? ?"client": "fanyideskweb",
 ? ?"product": "webfanyi",
 ? ?"appVersion": "1.0.0",
 ? ?"vendor": "web",
 ? ?"pointParam": "client,mysticTime,product",
 ? ?"mysticTime": mysticTime,
 ? ?"keyfrom": "fanyi.web",
 ? ?"mid": 1,
 ? ?"screen": 1,
 ? ?"model": 1,
 ? ?"network": "wifi",
 ? ?"abtest": 0,
 ? ?"yduuid": "abcdefg",
}
my_headers = {
 ? ?"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36",
 ? ?"Referer": "https://fanyi.xxx.com/",
 ? ?"Cookie": "OUTFOX_SEARCH_USER_ID_NCOO=1837136861.99783; OUTFOX_SEARCH_USER_ID=2039883963@103.156.184.202; UM_distinctid=18acc0c423c8a-067b7d9f92c33d-18525634-1d73c0-18acc0c423d1300; P_INFO=golang13121758648; ANTICSRF=cleared; NTES_OSESS=cleared; S_OINFO="
}
res = requests.post(url, data=data, headers=my_headers)
print(res.text)
?

结果:

sign::: 7bd3a03476323b1866ef981bbcd4f300
Z21kD9ZK1ke6ugku2ccWu-MeDWh3z252xRTQv-wZ6jddVo3tJLe7gIXz4PyxGl73nSfLAADyElSjjvrYdCvEP4pfohVVEX1DxoI0yhm36ytQNvu-WLU94qULZQ72aml6JKK7ArS9fJXAcsG7ufBIE0gd6fbnhFcsGmdXspZe-8whVFbRB_8Fc9JlMHh8DDXnskDhGfEscN_rfi-A-AHB3F9Vets82vIYpkGNaJOft_JA-m5cGEjo-UNRDDpkTz_NIAvo5PbATpkh7PSna2tHcE6Hou9GBtPLB67vjScwplB96-zqZKXJJEzU5HGF0oPDY_weAkXArzXyGLBPXFCnn_IWJDkGD4vqBQQAh2n52f48GD_cb-PSCT_8b-ESsKUI9NJa11XsdaUZxAc8TzrYnXwdcQbtl_kZGKhS6_rCtuNEBouA_lvM2CbS7TTtV2U4zVmJKpp-c6nt3yZePK3Av01GWn1pH_3sZbaPEx8DUjSbdp4i4iK-Mj4p2HPoph67DR7B9MFETYku_28SgP9xsKRRvFH4aHBHESWX4FDbwaU=
(2)JS逆向实现

很多JS代码的实现逻辑并不会像这个案例那么简单,所以绝大多数的逆向要依靠扣JS代码实现,这种方式不需要理解实现逻辑,只需要转换环境模拟。

我们需要将构建逆向值的代码拷贝到本地js文件中,下载一个node.js来解释运行这部分拷贝代码,缺什么补什么,报什么错定向解决,目的就是能和浏览器执行这段js代码一样顺利生成逆向值。代码如下:

这里涉及到算法相关的库时,不需要再去网站中找,直接调用即可

const cryptoJs = require("crypto")
const u = "fanyideskweb"
 ?  , d = "webfanyi"
 ?  , m = "client,mysticTime,product"
 ?  , p = "1.0.0"
 ?  , A = "web"
 ?  , g = "fanyi.web"
 ?  , b = 1
 ?  , h = 1
 ?  , f = 1
 ?  , v = "wifi"
 ?  , O = 0;
?
function j(e) {
 ? ?return cryptoJs.createHash("md5").update(e.toString()).digest("hex")
}
?
function k(e, t) {
 ? ?return j(`client=${u}&mysticTime=${e}&product=${d}&key=${t}`)
}
?
function get_sign() {
 ? ?let e = (new Date).getTime();
 ? ?let t = 'fsdsogkndfokasodnaso'
 ? ?let sign = k(e, t)
 ? ?return [sign,e]
}
?
console.log(get_sign())

接下来python通过execjs库调用js的方法,实现最终的JS逆向:

import execjs
import requests
?
url = "https://dict.xxx.com/webtranslate"
?
# (1)获取JS逆向动态值
?
with open("xxx.js") as f:
 ? ?js_code = f.read()
?
js_compile = execjs.compile(js_code)
sign,mysticTime = js_compile.call("get_sign")
print("sign:::",sign,mysticTime)
?
?
# (2)请求模拟
data = {
 ? ?"i": "apple",
 ? ?"from": "auto",
 ? ?"to": "",
 ? ?"dictResult": "true",
 ? ?"keyid": "webfanyi",
 ? ?"sign": sign,
 ? ?"client": "fanyideskweb",
 ? ?"product": "webfanyi",
 ? ?"appVersion": "1.0.0",
 ? ?"vendor": "web",
 ? ?"pointParam": "client,mysticTime,product",
 ? ?"mysticTime": mysticTime,
 ? ?"keyfrom": "fanyi.web",
 ? ?"mid": 1,
 ? ?"screen": 1,
 ? ?"model": 1,
 ? ?"network": "wifi",
 ? ?"abtest": 0,
 ? ?"yduuid": "abcdefg",
}
my_headers = {
 ? ?"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36",
 ? ?"Referer": "https://fanyi.xxx.com/",
 ? ?"Cookie": "OUTFOX_SEARCH_USER_ID_NCOO=1837136861.99783; OUTFOX_SEARCH_USER_ID=2039883963@103.156.184.202; UM_distinctid=18acc0c423c8a-067b7d9f92c33d-18525634-1d73c0-18acc0c423d1300; P_INFO=golang13121758648; ANTICSRF=cleared; NTES_OSESS=cleared; S_OINFO="
}
res = requests.post(url, data=data, headers=my_headers)
print(res.text)

结果:

sign::: 4a9052136bb96142a5b554cb6772938b 1700796415319
Z21kD9ZK1ke6ugku2ccWu-MeDWh3z252xRTQv-wZ6jddVo3tJLe7gIXz4PyxGl73nSfLAADyElSjjvrYdCvEP4pfohVVEX1DxoI0yhm36ytQNvu-WLU94qULZQ72aml6JKK7ArS9fJXAcsG7ufBIE0gd6fbnhFcsGmdXspZe-8whVFbRB_8Fc9JlMHh8DDXnskDhGfEscN_rfi-A-AHB3F9Vets82vIYpkGNaJOft_JA-m5cGEjo-UNRDDpkTz_NIAvo5PbATpkh7PSna2tHcE6Hou9GBtPLB67vjScwplB96-zqZKXJJEzU5HGF0oPDY_weAkXArzXyGLBPXFCnn_IWJDkGD4vqBQQAh2n52f48GD_cb-PSCT_8b-ESsKUI9NJa11XsdaUZxAc8TzrYnXwdcQbtl_kZGKhS6_rCtuNEBouA_lvM2CbS7TTtV2U4zVmJKpp-c6nt3yZePK3Av01GWn1pH_3sZbaPEx8DUjSbdp4i4iK-Mj4p2HPoph67DR7B9MFETYku_28SgP9xsKRRvFH4aHBHESWX4FDbwaU=
?

【4】数据解密

【AES算法补充】

AES是一种对称加密,所谓对称加密就是加密与解密使用的秘钥是一个。

常见的对称加密: AES, DES, 3DES. 我们这里讨论AES。

安装:

pip install pycryptodome

AES 加密最常用的模式就是 CBC 模式和 ECB模式 ,当然还有很多其它模式,他们都属于AES加密。ECB模式和CBC 模式俩者区别就是 ECB 不需要 iv偏移量,而CBC需要。

"""
长度
 ? ?16: *AES-128* ? 
 ? ?24: *AES-192*
 ? ?32: *AES-256*
 ? ?
MODE 加密模式. 
 ? ?常见的ECB, CBC
 ? ?ECB:是一种基础的加密方式,密文被分割成分组长度相等的块(不足补齐),然后单独一个个加密,一个个输出组成密文。
 ? ?CBC:是一种循环模式,前一个分组的密文和当前分组的明文异或或操作后再加密,这样做的目的是增强破解难度。
"""

CBC加密案例(选择aes-128):

from Crypto.Cipher import AES
from Crypto.Util.Padding import pad
import base64
?
key = '0123456789abcdef'.encode() ?# 秘钥: 因为aes-128模式,所以必须16字节
iv = b'abcdabcdabcdabcd' ?# 偏移量:因为aes-128模式,所以必须16字节
text = 'alex is a monkey!' ?# 加密内容,因为aes-128模式,所以字节长度必须是16的倍数
# while len(text.encode('utf-8')) % 16 != 0:  # 如果text不足16位的倍数就用空格补足为16位
# ? ? text += '\0'
text = pad(text.encode(), 16)
print("完整text:", text)
?
aes = AES.new(key, AES.MODE_CBC, iv) ?# 创建一个aes对象
?
en_text = aes.encrypt(text) ?# 加密明文
print("aes加密数据:::", en_text) ?# b"_\xf04\x7f/R\xef\xe9\x14#q\xd8A\x12\x8e\xe3\xa5\x93\x96'zOP\xc1\x85{\xad\xc2c\xddn\x86"
?
en_text = base64.b64encode(en_text).decode() ?# 将返回的字节型数据转进行base64编码
print(en_text) ?# X/A0fy9S7+kUI3HYQRKO46WTlid6T1DBhXutwmPdboY=

CBC解密案例:

from Crypto.Cipher import AES
import base64
from Crypto.Util.Padding import unpad
?
key = '0123456789abcdef'.encode()
iv = b'abcdabcdabcdabcd'
aes = AES.new(key, AES.MODE_CBC, iv)
?
text = 'X/A0fy9S7+kUI3HYQRKO46WTlid6T1DBhXutwmPdboY='.encode() ?# 需要解密的文本
ecrypted_base64 = base64.b64decode(text) ?# base64解码成字节流
source = aes.decrypt(ecrypted_base64) ?# 解密
print("aes解密数据:::", source.decode())
print("aes解密数据:::", unpad(source, 16).decode())
  1. 在Python中进行AES加密解密时,所传入的密文、明文、秘钥、iv偏移量、都需要是bytes(字节型)数据。python 在构建aes对象时也只能接受bytes类型数据。

  2. 当秘钥,iv偏移量,待加密的明文,字节长度不够16字节或者16字节倍数的时候需要进行补全。

  3. CBC模式需要重新生成AES对象,为了防止这类错误,无论是什么模式都重新生成AES对象就可以了。

很多网站请求模拟实现获取到的就是明文数据了,但是这个网站对相应数据也做了加密,所以要相应的解密。那么思路也是和请求一样,先定位解密位置。解密位置破解相对比较容易,因为请求成功后都会有一个回调函数,用于处理相应数据,客户端解密代码一般在这个回调函数中可以快速追溯。

js执行中有一个很重要的概念叫调用堆栈,即a函数调用了b函数,b函数调用了c,那么执行过程就是a->b>c,断点如果断在c函数中,此时的调用堆栈就是c->b->a,就是可以显示c函数由b调用,而ba调用的这层关系。

此时断点位置是sign构建,那么这个w函数通过调用堆栈去找真正发送ajax请求时候的回调函数,在那里去找解密代码。所以发现代码如下:

可以通过断点调试,decodeData前数据依然是加密的,执行完decodeData,数据解密完成。

定位函数位置:

很明显,这里是aes解密,128-cbc模式,所以只要有key和iv即可破解,断点进入该函数,对代码分析,a和c分别就是key和iv,都是通过y函数对o,n计算出来,所以先打印确定o和n:

o和n分别是两个固定字符串,最后定位y函数:

所以确定key和iv分别是这两个字符串的md5值。

解密也是可以通过Python实现或copy JS完成

【翻译解密实战】
(1)Python实现完整版
import requests
import time
import hashlib
import base64
from Crypto.Cipher import AES
?
?
def get_md5(val, is_hex=True):
 ? ?md5 = hashlib.md5()
 ? ?md5.update(val.encode())
 ? ?if is_hex:
 ? ? ? ?return md5.hexdigest()
 ? ?else:
 ? ? ? ?return md5.digest()
?
?
url = "https://dict.xxx.com/webtranslate"
?
# (1)构建逆向动态值
mysticTime = str(int(time.time() * 1000))
print(mysticTime)
?
d = 'fanyideskweb'
e = mysticTime
u = 'webfanyi'
t = 'fsdsogkndfokasodnaso'
s = f"client={d}&mysticTime={e}&product={u}&key={t}"
print("s:::", s)
sign = get_md5(s)
?
# (2)请求模拟
data = {
 ? ?"i": "apple",
 ? ?"from": "auto",
 ? ?"to": "",
 ? ?"dictResult": "true",
 ? ?"keyid": "webfanyi",
 ? ?"sign": sign,
 ? ?"client": "fanyideskweb",
 ? ?"product": "webfanyi",
 ? ?"appVersion": "1.0.0",
 ? ?"vendor": "web",
 ? ?"pointParam": "client,mysticTime,product",
 ? ?"mysticTime": mysticTime,
 ? ?"keyfrom": "fanyi.web",
 ? ?"mid": 1,
 ? ?"screen": 1,
 ? ?"model": 1,
 ? ?"network": "wifi",
 ? ?"abtest": 0,
 ? ?"yduuid": "abcdefg",
}
my_headers = {
 ? ?"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36",
 ? ?"Referer": "https://fanyi.xxx.com/",
 ? ?"Cookie": "OUTFOX_SEARCH_USER_ID_NCOO=1837136861.99783; OUTFOX_SEARCH_USER_ID=2039883963@103.156.184.202; UM_distinctid=18acc0c423c8a-067b7d9f92c33d-18525634-1d73c0-18acc0c423d1300; P_INFO=golang13121758648; ANTICSRF=cleared; NTES_OSESS=cleared; S_OINFO="
}
res = requests.post(url, data=data, headers=my_headers)
?
# (3)解码和解密数据
res_encrypt_base64 = res.text.replace("-", "+").replace("_", "/")
print("res_encrypt_base64:::", res_encrypt_base64)
# 解码
res_encrypt = base64.b64decode(res_encrypt_base64)
print("res_encrypt:::", res_encrypt)
# AES解密
?
# 密钥
o = 'ydsecret://query/key/B*RGygVywfNBwpmBaZg*WT7SIOUP2T0C9WHMZN39j^DAdaZhAnxvGcCY6VYFwnHl'
n = 'ydsecret://query/iv/C@lZe2YzHtZ2CYgaXKSVfsb7Y4QWHjITPPZ0nQp87fBeJ!Iv6v^6fvi2WN@bYpJ4'
?
key = get_md5(o, is_hex=False)
# 偏移量
iv = get_md5(n, is_hex=False)
# 解密
# 构建aes算法对象
aes = AES.new(key, AES.MODE_CBC, iv)
source_data = aes.decrypt(res_encrypt).decode()
print("source_data:", source_data)
?
(2)JS逆向完整实现

拷贝的JS代码整理,xxx.js

const cryptoJs = require("crypto")
const u = "fanyideskweb"
 ?  , d = "webfanyi"
 ?  , m = "client,mysticTime,product"
 ?  , p = "1.0.0"
 ?  , A = "web"
 ?  , g = "fanyi.web"
 ?  , b = 1
 ?  , h = 1
 ?  , f = 1
 ?  , v = "wifi"
 ?  , O = 0;
?
function j(e) {
 ? ?return cryptoJs.createHash("md5").update(e.toString()).digest("hex")
}
?
function k(e, t) {
 ? ?return j(`client=${u}&mysticTime=${e}&product=${d}&key=${t}`)
}
?
function get_sign() {
 ? ?let e = (new Date).getTime();
 ? ?let t = 'fsdsogkndfokasodnaso'
 ? ?let sign = k(e, t)
 ? ?return [sign, e]
}
?
console.log(get_sign())
?
?
// 解密
function y(e) {
 ? ?return cryptoJs.createHash("md5").update(e).digest()
}
?
function jieMi(t) {
 ? ?let o = 'ydsecret://query/key/B*RGygVywfNBwpmBaZg*WT7SIOUP2T0C9WHMZN39j^DAdaZhAnxvGcCY6VYFwnHl'
 ? ?let n = 'ydsecret://query/iv/C@lZe2YzHtZ2CYgaXKSVfsb7Y4QWHjITPPZ0nQp87fBeJ!Iv6v^6fvi2WN@bYpJ4'
 ? ?if (!t)
 ? ? ? ?return null;
 ? ?const a = y(o)
 ? ? ?  , c = y(n)
 ? ? ?  , r = cryptoJs.createDecipheriv("aes-128-cbc", a, c);
 ? ?let s = r.update(t, "base64", "utf-8");
 ? ?return s += r.final("utf-8"),
 ? ? ? ?s
}
?
console.log(jieMi('Z21kD9ZK1ke6ugku2ccWu-MeDWh3z252xRTQv-wZ6jddVo3tJLe7gIXz4PyxGl73nSfLAADyElSjjvrYdCvEP4pfohVVEX1DxoI0yhm36ytQNvu-WLU94qULZQ72aml6JKK7ArS9fJXAcsG7ufBIE0gd6fbnhFcsGmdXspZe-8 whVFbRB_8Fc9JlMHh8DDXnskDhGfEscN_rfi-A-AHB3F9Vets82vIYpkGNaJOft_JA-m5cGEjo-UNRDDpkTz_NIAvo5PbATpkh7PSna2tHcE6Hou9GBtPLB67vjScwplB96-zqZKXJJEzU5HGF0oPDY_weAkXArzXyGLBPXFCnn_IWJDkGD4vqBQQAh2n52f48GD_cb-PSCT_8b-ESsKUI9NJa11XsdaUZxAc8TzrYnXwdcQbtl_kZGKhS6_rCtuNEBouA_lvM2CbS7TTtV2U4zVmJKpp-c6nt3yZePK3Av01GWn1pH_3sZbaPEx8DUjSbdp4i4iK-Mj4p2HPoph67DR7B9MFETYku_28SgP9xsKRRvFH4aHBHESWX4FDbwaU='))

Python调用JS完成:

import execjs
import requests
?
url = "https://dict.xxx.com/webtranslate"
?
# (1)获取JS逆向动态值
?
with open("xxx.js") as f:
 ? ?js_code = f.read()
?
js_compile = execjs.compile(js_code)
sign,mysticTime = js_compile.call("get_sign")
print("sign:::",sign,mysticTime)
?
?
# (2)请求模拟
data = {
 ? ?"i": "apple",
 ? ?"from": "auto",
 ? ?"to": "",
 ? ?"dictResult": "true",
 ? ?"keyid": "webfanyi",
 ? ?"sign": sign,
 ? ?"client": "fanyideskweb",
 ? ?"product": "webfanyi",
 ? ?"appVersion": "1.0.0",
 ? ?"vendor": "web",
 ? ?"pointParam": "client,mysticTime,product",
 ? ?"mysticTime": mysticTime,
 ? ?"keyfrom": "fanyi.web",
 ? ?"mid": 1,
 ? ?"screen": 1,
 ? ?"model": 1,
 ? ?"network": "wifi",
 ? ?"abtest": 0,
 ? ?"yduuid": "abcdefg",
}
my_headers = {
 ? ?"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36",
 ? ?"Referer": "https://fanyi.xxx.com/",
 ? ?"Cookie": "OUTFOX_SEARCH_USER_ID_NCOO=1837136861.99783; OUTFOX_SEARCH_USER_ID=2039883963@103.156.184.202; UM_distinctid=18acc0c423c8a-067b7d9f92c33d-18525634-1d73c0-18acc0c423d1300; P_INFO=golang13121758648; ANTICSRF=cleared; NTES_OSESS=cleared; S_OINFO="
}
res = requests.post(url, data=data, headers=my_headers)
print(res.text)
?
?
# 解密数据
?
data = js_compile.call("jieMi",res.text)
print("解密后的数据:",data)
?

结果:

sign::: 554c48439c46e29931f54be1b724df3f 1700799183632
Z21kD9ZK1ke6ugku2ccWu-MeDWh3z252xRTQv-wZ6jddVo3tJLe7gIXz4PyxGl73nSfLAADyElSjjvrYdCvEP4pfohVVEX1DxoI0yhm36ytQNvu-WLU94qULZQ72aml6JKK7ArS9fJXAcsG7ufBIE0gd6fbnhFcsGmdXspZe-8whVFbRB_8Fc9JlMHh8DDXnskDhGfEscN_rfi-A-AHB3F9Vets82vIYpkGNaJOft_JA-m5cGEjo-UNRDDpkTz_NIAvo5PbATpkh7PSna2tHcE6Hou9GBtPLB67vjScwplB96-zqZKXJJEzU5HGF0oPDY_weAkXArzXyGLBPXFCnn_IWJDkGD4vqBQQAh2n52f48GD_cb-PSCT_8b-ESsKUI9NJa11XsdaUZxAc8TzrYnXwdcQbtl_kZGKhS6_rCtuNEBouA_lvM2CbS7TTtV2U4zVmJKpp-c6nt3yZePK3Av01GWn1pH_3sZbaPEx8DUjSbdp4i4iK-Mj4p2HPoph67DR7B9MFETYku_28SgP9xsKRRvFH4aHBHESWX4FDbwaU=
解密后的数据: {"code":0,"dictResult":{"ec":{"exam_type":["初中","高中","CET4","CET6","考研"],"word":{"usphone":"??p(?)l","ukphone":"??p(?)l","ukspeech":"apple&type=1","trs":[{"pos":"n.","tran":"苹果"}],"wfs":[{"wf":{"name":"复数","value":"apples"}}],"return-phrase":"apple","usspeech":"apple&type=2"}}},"translateResult":[[{"tgt":"苹果","src":"apple","tgtPronounce":"pín gu?"}]],"type":"en2zh-CHS"}
?

到这整个案例就给大家介绍完了,希望大家能通过这个案例掌握爬虫逆向的基本思路,流程以及破解的技巧。

Day03:抖音逆向X-B

目标URL:

# https://www.douyin.com/aweme/v1/web/aweme/post/
import requests
?
headers = {
 ? ?'authority': 'www.douyin.com',
 ? ?'accept': 'application/json, text/plain, */*',
 ? ?'accept-language': 'zh-CN,zh;q=0.9',
 ? ?'cache-control': 'no-cache',
 ? ?'cookie': 'ttwid=1%7CvQ6QCiLyIG9SJypBIXRtIfGPJXv6br9a79NgmLfR-U4%7C1697436889%7C0ea69e384e5deb1dc65f4200190d8d2f33f9c4ca6c30e10208ee46af29a5015d; passport_csrf_token=86dff46d8a31bd2d89aba8859d6b9839; passport_csrf_token_default=86dff46d8a31bd2d89aba8859d6b9839; s_v_web_id=verify_lnsi3jm8_cqfdqTI4_fYoM_4wDc_8VNE_bb4xCKSzPapH; odin_tt=554cb19fe22b9da317001c2f9de95b4e1a7d360dfdbae35d1dfc361c0088b8f4af048efc12f6935fa5d5f6100e9cbbf7580b76969425528255343c9e13dc19c8b7c995d443abeafc2ab8a498b7cf9a4c; volume_info=%7B%22isUserMute%22%3Afalse%2C%22isMute%22%3Atrue%2C%22volume%22%3A0.314%7D; FORCE_LOGIN=%7B%22videoConsumedRemainSeconds%22%3A180%7D; SEARCH_RESULT_LIST_TYPE=%22single%22; download_guide=%223%2F20231107%2F0%22; pwa2=%220%7C0%7C3%7C0%22; douyin.com; device_web_cpu_core=10; device_web_memory_size=8; webcast_local_quality=null; csrf_session_id=15b2efd04735f33bf97e434c422e4381; __ac_nonce=0654ca6e9005f3a669ca5; __ac_signature=_02B4Z6wo00f011w8rhwAAIDBHWEjuua2TSdcHKqAALJOVl79BpvTTdq7UdfJ0ZJivSubfqR7DTSRhBMZXolqgQ91ptK9wlWted1ar-p7M0KTXx7UiiwDtyagOAPIbx.TSMA0.mEGXEAGx2qhae; VIDEO_FILTER_MEMO_SELECT=%7B%22expireTime%22%3A1700127086066%2C%22type%22%3A1%7D; strategyABtestKey=%221699522286.12%22; bd_ticket_guard_client_data=eyJiZC10aWNrZXQtZ3VhcmQtdmVyc2lvbiI6MiwiYmQtdGlja2V0LWd1YXJkLWl0ZXJhdGlvbi12ZXJzaW9uIjoxLCJiZC10aWNrZXQtZ3VhcmQtcmVlLXB1YmxpYy1rZXkiOiJCQnNtWm5qUEJ6SExwVlpzZjhzV1BoYWlOQzJZM0ZNNk9iTnNoOGRQMzFmRFVtOVdDLzhXWHJ4NVFDTXZvTWZLdFNuMVlKU2ZvclVETmZ6SEkrUkF5MVE9IiwiYmQtdGlja2V0LWd1YXJkLXdlYi12ZXJzaW9uIjoxfQ%3D%3D; tt_scid=Bij66aryo2u0w.o16gYX3caPsmVWrfe-6Pk1ulTwQgNWudwhGRTGfTUIW.V7QT1Kc2a9; msToken=7OYWMqm4cfsfI0e0Ll0FUACzSrrtw96Ey0A5o6DUTNI5FksUP8JJXo88BzQ4aujOB5foj2ADof_1URtCZEzghUfOi6D1Rs4YKHbRS-5rCQYNC3wEs71j3DNXCiLGKg==; msToken=Noy13zPl0zuSYJ9nJ8DHykX-W5olMoMC3OmO0sEuhxH3pojze2qYwurlsL9BtfsIgFDH6YbQDFF2v5ebPWrQyUUEa3d0xNqHTtzuBYj21dzd43p3pZi56bfy6xRx; stream_recommend_feed_params=%22%7B%5C%22cookie_enabled%5C%22%3Atrue%2C%5C%22screen_width%5C%22%3A1496%2C%5C%22screen_height%5C%22%3A967%2C%5C%22browser_online%5C%22%3Atrue%2C%5C%22cpu_core_num%5C%22%3A10%2C%5C%22device_memory%5C%22%3A8%2C%5C%22downlink%5C%22%3A10%2C%5C%22effective_type%5C%22%3A%5C%224g%5C%22%2C%5C%22round_trip_time%5C%22%3A100%7D%22; IsDouyinActive=true; home_can_add_dy_2_desktop=%221%22',
 ? ?'pragma': 'no-cache',
 ? ?'referer': 'https://www.douyin.com/user/MS4wLjABAAAA0HwZJN6-JDCSTjxiMk-czhyZWxes8XIDEjppFXExauK8-kQTLMEH9ZdfIXxnl9tS',
 ? ?'sec-ch-ua': '"Google Chrome";v="119", "Chromium";v="119", "Not?A_Brand";v="24"',
 ? ?'sec-ch-ua-mobile': '?0',
 ? ?'sec-ch-ua-platform': '"macOS"',
 ? ?'sec-fetch-dest': 'empty',
 ? ?'sec-fetch-mode': 'cors',
 ? ?'sec-fetch-site': 'same-origin',
 ? ?'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36',
}
?
params = {
 ? ?'device_platform': 'webapp',
 ? ?'aid': '6383',
 ? ?'channel': 'channel_pc_web',
 ? ?'sec_user_id': 'MS4wLjABAAAA0HwZJN6-JDCSTjxiMk-czhyZWxes8XIDEjppFXExauK8-kQTLMEH9ZdfIXxnl9tS',
 ? ?'max_cursor': '0',
 ? ?'locate_query': 'false',
 ? ?'show_live_replay_strategy': '1',
 ? ?'need_time_list': '1',
 ? ?'time_list_query': '0',
 ? ?'whale_cut_token': '',
 ? ?'cut_version': '1',
 ? ?'count': '18',
 ? ?'publish_video_strategy_type': '2',
 ? ?'pc_client_type': '1',
 ? ?'version_code': '170400',
 ? ?'version_name': '17.4.0',
 ? ?'cookie_enabled': 'true',
 ? ?'screen_width': '1496',
 ? ?'screen_height': '967',
 ? ?'browser_language': 'zh-CN',
 ? ?'browser_platform': 'MacIntel',
 ? ?'browser_name': 'Chrome',
 ? ?'browser_version': '119.0.0.0',
 ? ?'browser_online': 'true',
 ? ?'engine_name': 'Blink',
 ? ?'engine_version': '119.0.0.0',
 ? ?'os_name': 'Mac OS',
 ? ?'os_version': '10.15.7',
 ? ?'cpu_core_num': '10',
 ? ?'device_memory': '8',
 ? ?'platform': 'PC',
 ? ?'downlink': '10',
 ? ?'effective_type': '4g',
 ? ?'round_trip_time': '100',
 ? ?'webid': '7290435875619390995',
 ? ?'msToken': 'Noy13zPl0zuSYJ9nJ8DHykX-W5olMoMC3OmO0sEuhxH3pojze2qYwurlsL9BtfsIgFDH6YbQDFF2v5ebPWrQyUUEa3d0xNqHTtzuBYj21dzd43p3pZi56bfy6xRx',
 ? ?'X-Bogus': 'DFSzswVYIOtANx--tFiuUgHB7tIG',
}
?
response = requests.get('https://www.douyin.com/aweme/v1/web/aweme/post/', params=params, headers=headers)
print(response.text)
?

【1】入口定位

XHR提取断点:/v1/web/aweme/post

条件断点:

_0xc26b5e.openArgs[1].search("/aweme/v1/web/aweme/post/")>=0

日志断点:

_0x2458f0['apply'](_0xc26b5e, _0x1f1790);

条件断点:

_0x2458f0['apply'](_0xc26b5e, _0x1f1790).length == 28

【2】JS逆向&补环境

将整个webmssdk.es5.js拷贝到本地js文件

全局设置:

window.yuan = _0x5a8f25

代码结构:

window = global
document = {}
?
document.addEventListener = function addEventListener() {
}
navigator = {
 ? ?userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36'
}
?
// 源码开始...
window.yuan = _0x5a8f25
// 源码结束...
?
// 测试
// var data = '123456'
// console.log(window.yuan(data, null))

Python代码:

import requests
import execjs
import urllib.parse
?
with open("douyin.js") as f:
 ? ?js_code = f.read()
?
js_compile = execjs.compile(js_code)
?
headers = {
 ? ?'authority': 'www.douyin.com',
 ? ?'accept': 'application/json, text/plain, */*',
 ? ?'accept-language': 'zh-CN,zh;q=0.9',
 ? ?'cache-control': 'no-cache',
 ? ?'cookie': 'ttwid=1%7CvQ6QCiLyIG9SJypBIXRtIfGPJXv6br9a79NgmLfR-U4%7C1697436889%7C0ea69e384e5deb1dc65f4200190d8d2f33f9c4ca6c30e10208ee46af29a5015d; passport_csrf_token=86dff46d8a31bd2d89aba8859d6b9839; passport_csrf_token_default=86dff46d8a31bd2d89aba8859d6b9839; s_v_web_id=verify_lnsi3jm8_cqfdqTI4_fYoM_4wDc_8VNE_bb4xCKSzPapH; odin_tt=554cb19fe22b9da317001c2f9de95b4e1a7d360dfdbae35d1dfc361c0088b8f4af048efc12f6935fa5d5f6100e9cbbf7580b76969425528255343c9e13dc19c8b7c995d443abeafc2ab8a498b7cf9a4c; volume_info=%7B%22isUserMute%22%3Afalse%2C%22isMute%22%3Atrue%2C%22volume%22%3A0.314%7D; FORCE_LOGIN=%7B%22videoConsumedRemainSeconds%22%3A180%7D; SEARCH_RESULT_LIST_TYPE=%22single%22; download_guide=%223%2F20231107%2F0%22; pwa2=%220%7C0%7C3%7C0%22; douyin.com; device_web_cpu_core=10; device_web_memory_size=8; webcast_local_quality=null; csrf_session_id=15b2efd04735f33bf97e434c422e4381; __ac_nonce=0654ca6e9005f3a669ca5; __ac_signature=_02B4Z6wo00f011w8rhwAAIDBHWEjuua2TSdcHKqAALJOVl79BpvTTdq7UdfJ0ZJivSubfqR7DTSRhBMZXolqgQ91ptK9wlWted1ar-p7M0KTXx7UiiwDtyagOAPIbx.TSMA0.mEGXEAGx2qhae; VIDEO_FILTER_MEMO_SELECT=%7B%22expireTime%22%3A1700127086066%2C%22type%22%3A1%7D; strategyABtestKey=%221699522286.12%22; bd_ticket_guard_client_data=eyJiZC10aWNrZXQtZ3VhcmQtdmVyc2lvbiI6MiwiYmQtdGlja2V0LWd1YXJkLWl0ZXJhdGlvbi12ZXJzaW9uIjoxLCJiZC10aWNrZXQtZ3VhcmQtcmVlLXB1YmxpYy1rZXkiOiJCQnNtWm5qUEJ6SExwVlpzZjhzV1BoYWlOQzJZM0ZNNk9iTnNoOGRQMzFmRFVtOVdDLzhXWHJ4NVFDTXZvTWZLdFNuMVlKU2ZvclVETmZ6SEkrUkF5MVE9IiwiYmQtdGlja2V0LWd1YXJkLXdlYi12ZXJzaW9uIjoxfQ%3D%3D; tt_scid=Bij66aryo2u0w.o16gYX3caPsmVWrfe-6Pk1ulTwQgNWudwhGRTGfTUIW.V7QT1Kc2a9; msToken=7OYWMqm4cfsfI0e0Ll0FUACzSrrtw96Ey0A5o6DUTNI5FksUP8JJXo88BzQ4aujOB5foj2ADof_1URtCZEzghUfOi6D1Rs4YKHbRS-5rCQYNC3wEs71j3DNXCiLGKg==; msToken=Noy13zPl0zuSYJ9nJ8DHykX-W5olMoMC3OmO0sEuhxH3pojze2qYwurlsL9BtfsIgFDH6YbQDFF2v5ebPWrQyUUEa3d0xNqHTtzuBYj21dzd43p3pZi56bfy6xRx; stream_recommend_feed_params=%22%7B%5C%22cookie_enabled%5C%22%3Atrue%2C%5C%22screen_width%5C%22%3A1496%2C%5C%22screen_height%5C%22%3A967%2C%5C%22browser_online%5C%22%3Atrue%2C%5C%22cpu_core_num%5C%22%3A10%2C%5C%22device_memory%5C%22%3A8%2C%5C%22downlink%5C%22%3A10%2C%5C%22effective_type%5C%22%3A%5C%224g%5C%22%2C%5C%22round_trip_time%5C%22%3A100%7D%22; IsDouyinActive=true; home_can_add_dy_2_desktop=%221%22',
 ? ?'pragma': 'no-cache',
 ? ?'referer': 'https://www.douyin.com/user/MS4wLjABAAAA0HwZJN6-JDCSTjxiMk-czhyZWxes8XIDEjppFXExauK8-kQTLMEH9ZdfIXxnl9tS',
 ? ?'sec-ch-ua': '"Google Chrome";v="119", "Chromium";v="119", "Not?A_Brand";v="24"',
 ? ?'sec-ch-ua-mobile': '?0',
 ? ?'sec-ch-ua-platform': '"macOS"',
 ? ?'sec-fetch-dest': 'empty',
 ? ?'sec-fetch-mode': 'cors',
 ? ?'sec-fetch-site': 'same-origin',
 ? ?'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36',
}
?
params = {
 ? ?'device_platform': 'webapp',
 ? ?'aid': '6383',
 ? ?'channel': 'channel_pc_web',
 ? ?# 'sec_user_id': 'MS4wLjABAAAA0HwZJN6-JDCSTjxiMk-czhyZWxes8XIDEjppFXExauK8-kQTLMEH9ZdfIXxnl9tS',
 ? ?'sec_user_id': "MS4wLjABAAAAMbqnWxzUfZegt9vrNBDz7zyqwhvG6vXiKTDxVm2wUD0",
 ? ?'max_cursor': '0',
 ? ?'locate_query': 'false',
 ? ?'show_live_replay_strategy': '1',
 ? ?'need_time_list': '1',
 ? ?'time_list_query': '0',
 ? ?'whale_cut_token': '',
 ? ?'cut_version': '1',
 ? ?'count': '18',
 ? ?'publish_video_strategy_type': '2',
 ? ?'pc_client_type': '1',
 ? ?'version_code': '170400',
 ? ?'version_name': '17.4.0',
 ? ?'cookie_enabled': 'true',
 ? ?'screen_width': '1496',
 ? ?'screen_height': '967',
 ? ?'browser_language': 'zh-CN',
 ? ?'browser_platform': 'MacIntel',
 ? ?'browser_name': 'Chrome',
 ? ?'browser_version': '119.0.0.0',
 ? ?'browser_online': 'true',
 ? ?'engine_name': 'Blink',
 ? ?'engine_version': '119.0.0.0',
 ? ?'os_name': 'Mac OS',
 ? ?'os_version': '10.15.7',
 ? ?'cpu_core_num': '10',
 ? ?'device_memory': '8',
 ? ?'platform': 'PC',
 ? ?'downlink': '10',
 ? ?'effective_type': '4g',
 ? ?'round_trip_time': '100',
 ? ?'webid': '7290435875619390995',
 ? ?'msToken': 'Noy13zPl0zuSYJ9nJ8DHykX-W5olMoMC3OmO0sEuhxH3pojze2qYwurlsL9BtfsIgFDH6YbQDFF2v5ebPWrQyUUEa3d0xNqHTtzuBYj21dzd43p3pZi56bfy6xRx',
 ? ?# 'X-Bogus': 'DFSzswVYIOtANx--tFiuUgHB7tIG',
}
?
params_str = urllib.parse.urlencode(params)
?
X_B = js_compile.call("window.yuan", params_str)
?
print("X_B:", X_B)
?
params["X-Bogus"] = X_B
?
response = requests.get('https://www.douyin.com/aweme/v1/web/aweme/post/', params=params, headers=headers)
print(response.text)
?
文章来源:https://blog.csdn.net/m0_51775295/article/details/135676925
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。