Python 爬虫 之 抖音视频采集

发布时间:2024年01月19日

嗨喽,大家好呀~这里是爱看美女的茜茜呐

知识点:

  • 动态数据抓包

  • requests发送请求

开发环境:

  • python 3.8 运行代码

  • pycharm 2022.3 辅助敲代码

  • requests pip install requests

如何安装python第三方模块:

  1. win + R 输入 cmd 点击确定, 输入安装命令 pip install 模块名 (pip install requests) 回车

  2. 在pycharm中点击Terminal(终端) 输入安装命令


👇 👇 👇 更多精彩机密、教程,尽在下方,赶紧点击了解吧~

python源码、视频教程、插件安装教程、资料我都准备好了,直接在文末名片自取就可


代码展示

导入模块

'''
python资料获取看这里噢!! 小编 V:Pytho8987(记得好友验证备注:6 否则可能不通过)
即可获取:文章源码/教程/资料/解答等福利,还有不错的视频学习教程和PDF电子书!
'''
import requests
import execjs
f = open('xb.js', mode='r').read()
ctx = execjs.compile(f)

伪装模拟

headers = {
    'Referer': 'https://www.douyin.com/user/MS4wLjABAAAAzQERuE7CoS-4bipZA1SxCHPOuQ4_FpJTX6qDlUAH7NfqdASG_BFfry9kjlJlQCUV',
    'Cookie': 'ttwid=1%7CZnvCElyZz1gN57Mo6WzIS2oQeAGdv9eSuRPEbj3_Vo0%7C1700833054%7C17e406efdf369a7d50a83874ed7ee26ebb03f5999835c9aad456f6f13c394457; passport_csrf_token=4b99913d90b038623980bf0063377574; passport_csrf_token_default=4b99913d90b038623980bf0063377574; s_v_web_id=verify_lpco35q1_TjzppwW7_Hykz_4LVi_AhWq_GY1CFLEgZNQ7; bd_ticket_guard_client_web_domain=2; d_ticket=ae98c12d9b53550eb12fafaa2704d982527e3; passport_assist_user=CkCJCHnpce5d8nWCiqX1tavrcPs-CPWdD5ch7JQpDTcb5LUvoK3My2vUE1inyG-3oNGDmHsaxJTHlZR2nonI9QjpGkoKPPCLVgcAILd3-xa734oKHtSMTGTqBlJoECv0gXAdqpc5zud0Uz7YdMBUdpqEBFNIy6rX0-r2zburA_pyfRCblMINGImv1lQgASIBA-jwYWM%3D; n_mh=lEaWHU3rAl1QbrjHS25yDvJUAdcA4R4oZvpmH7DOjl8; sso_uid_tt=b5a4695d261438adcc512609965e5592; sso_uid_tt_ss=b5a4695d261438adcc512609965e5592; toutiao_sso_user=9692e69067470f6b2ddd6d87e3845509; toutiao_sso_user_ss=9692e69067470f6b2ddd6d87e3845509; LOGIN_STATUS=1; store-region=cn-hn; store-region-src=uid; _bd_ticket_crypt_doamin=2; _bd_ticket_crypt_cookie=0cc21c32295a6d709fdbee10711e06b0; __security_server_data_status=1; my_rd=2; sid_ucp_sso_v1=1.0.0-KDM2NjdlOTk0NmM3ODY4NTc3NWFmYzFhNzA2YjE1ODY1YWQwNzEwMWUKHgiA3bH-mow8EILrz6wGGO8xIAww3NiXgQY4AkDxBxoCbGYiIDk2OTJlNjkwNjc0NzBmNmIyZGRkNmQ4N2UzODQ1NTA5; ssid_ucp_sso_v1=1.0.0-KDM2NjdlOTk0NmM3ODY4NTc3NWFmYzFhNzA2YjE1ODY1YWQwNzEwMWUKHgiA3bH-mow8EILrz6wGGO8xIAww3NiXgQY4AkDxBxoCbGYiIDk2OTJlNjkwNjc0NzBmNmIyZGRkNmQ4N2UzODQ1NTA5; sid_guard=896d898a7cac7094333d6d217927c9c7%7C1704195458%7C5184000%7CSat%2C+02-Mar-2024+11%3A37%3A38+GMT; uid_tt=e8b533f2ffd5bc06c482b1248da0aded; uid_tt_ss=e8b533f2ffd5bc06c482b1248da0aded; sid_tt=896d898a7cac7094333d6d217927c9c7; sessionid=896d898a7cac7094333d6d217927c9c7; sessionid_ss=896d898a7cac7094333d6d217927c9c7; sid_ucp_v1=1.0.0-KDNlZWIxYTRiNDU1M2VkYTljNDExN2VmMjhhZDE2YTI3ZDVkNGM0N2UKGgiA3bH-mow8EILrz6wGGO8xIAw4AkDxB0gEGgJscSIgODk2ZDg5OGE3Y2FjNzA5NDMzM2Q2ZDIxNzkyN2M5Yzc; ssid_ucp_v1=1.0.0-KDNlZWIxYTRiNDU1M2VkYTljNDExN2VmMjhhZDE2YTI3ZDVkNGM0N2UKGgiA3bH-mow8EILrz6wGGO8xIAw4AkDxB0gEGgJscSIgODk2ZDg5OGE3Y2FjNzA5NDMzM2Q2ZDIxNzkyN2M5Yzc; __live_version__=%221.1.1.6766%22; live_use_vvc=%22false%22; __ac_nonce=0659e88bd00c26baac5b3; __ac_signature=_02B4Z6wo00f01OjfsAgAAIDB4vlroLsGiwjo.7SAAF-vQzj5qa-iDByxYb1vMiCLy7ipiAtDIvRvyJVCPIdqrJkJmLXfdyQ0BwCsL6Fg5l0nsCgyTP8xLH2TEQIBmW3KqpojY1YaYMI78kVG7c; publish_badge_show_info=%220%2C0%2C0%2C1704888510677%22; dy_swidth=1920; dy_sheight=1080; csrf_session_id=ad84cc316d92bd2907e82dd71c220638; strategyABtestKey=%221704888511.209%22; bd_ticket_guard_client_data=eyJiZC10aWNrZXQtZ3VhcmQtdmVyc2lvbiI6MiwiYmQtdGlja2V0LWd1YXJkLWl0ZXJhdGlvbi12ZXJzaW9uIjoxLCJiZC10aWNrZXQtZ3VhcmQtcmVlLXB1YmxpYy1rZXkiOiJCQXNwWDdydTNIT3lySVMrVEhzVDdQRTJRdG9YUElRYmhyUENMT3FUWDBuc2NxWHRMc3dJVmc0cm5vM0k2bzF1Y2R1MjM3MTh5dC9HQ2RDREozWEYxQzg9IiwiYmQtdGlja2V0LWd1YXJkLXdlYi12ZXJzaW9uIjoxfQ%3D%3D; msToken=BcWXV2wRVB6Uqg7aU8EDzoO5i4YVQx9aZdA1XUJYFQsc0jvLtGlhtjbrG2hmFRZVPVgi6QsNMpGLn5LlSjG3-C-TsXXo_NMFNWJIF3f17TsFD9EJZPU=; odin_tt=c5dbd50fea17601b6e035f6167f694476c418df4dff275de07d29fd379ab4773f57131007a6f51e30194f46d1385215bb12194f654e6e0fef6ebd5bb71989df5; tt_scid=GJKC.qMzd15vIyztW80QZc1tUvLBq5d19tVRKHT-GqZ.BFG2EBk6u3zHqChQukONcf55; download_guide=%221%2F20240110%2F0%22; pwa2=%220%7C0%7C1%7C0%22; msToken=vmPRSPP81ksgNNnPnZ3C8qQ1O-igjaJFz84prkcp9Q-RW4Yvzz91UxJVJVnSLzyWj03bmSqtK74idZWkK5bD6M0nb55E8_iDbRNeUJauBHJ4OMQiNmM=; passport_fe_beating_status=false; IsDouyinActive=true; home_can_add_dy_2_desktop=%220%22; stream_recommend_feed_params=%22%7B%5C%22cookie_enabled%5C%22%3Atrue%2C%5C%22screen_width%5C%22%3A1920%2C%5C%22screen_height%5C%22%3A1080%2C%5C%22browser_online%5C%22%3Atrue%2C%5C%22cpu_core_num%5C%22%3A6%2C%5C%22device_memory%5C%22%3A8%2C%5C%22downlink%5C%22%3A10%2C%5C%22effective_type%5C%22%3A%5C%224g%5C%22%2C%5C%22round_trip_time%5C%22%3A0%7D%22',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
}

如何分析加密参数

  1. 确定这个参数是否为加密参数

    他是不是服务器返回的数据

    如果不是 那么这个参数 是否存在校验

'''
python资料获取看这里噢!! 小编 V:Pytho8987(记得好友验证备注:6 否则可能不通过)
即可获取:文章源码/教程/资料/解答等福利,还有不错的视频学习教程和PDF电子书!
'''
arg = 'device_platform=webapp&aid=6383&channel=channel_pc_web&sec_user_id=MS4wLjABAAAAzQERuE7CoS-4bipZA1SxCHPOuQ4_FpJTX6qDlUAH7NfqdASG_BFfry9kjlJlQCUV&max_cursor=0&locate_query=false&show_live_replay_strategy=1&need_time_list=1&time_list_query=0&whale_cut_token=&cut_version=1&count=18&publish_video_strategy_type=2&pc_client_type=1&version_code=170400&version_name=17.4.0&cookie_enabled=true&screen_width=1920&screen_height=1080&browser_language=zh-CN&browser_platform=Win32&browser_name=Chrome&browser_version=120.0.0.0&browser_online=true&engine_name=Blink&engine_version=120.0.0.0&os_name=Windows&os_version=10&cpu_core_num=6&device_memory=8&platform=PC&downlink=10&effective_type=4g&round_trip_time=50&webid=7305022292394837567&msToken=aE3KWXoDSk27YVEFDUi1Wo6Nw50F4qNttkJQ0fZF2tjzvlIeUHhuGDpnpj5ooNITWZVGuS5Y-TLFSLZgw0zmq1lc_MrmIdcuFpGPTHT6Uotg8GhnSPGyoY9cNqVa'
result = ctx.call('getXb', arg)
print(result)

请求链接

url = f'https://www.douyin.com/aweme/v1/web/aweme/post/?{arg}&X-Bogus={result}'
  1. 发送请求
response = requests.get(url=url, headers=headers)
  1. 获取数据
json_data = response.json()
  1. 解析数据
'''
python资料获取看这里噢!! 小编 V:Pytho8987(记得好友验证备注:6 否则可能不通过)
即可获取:文章源码/教程/资料/解答等福利,还有不错的视频学习教程和PDF电子书!
'''
aweme_list = json_data['aweme_list']
max_cursor = json_data['max_cursor']
print(max_cursor)
for aweme in aweme_list:
    caption = aweme['caption']
    aweme_id = aweme['aweme_id']
    video_url = aweme['video']['play_addr']['url_list'][0]
    print(aweme_id, caption, video_url)
  1. 保存视频
    # video_data = requests.get(video_url).content
    # with open(f'video/{aweme_id}.mp4', mode='wb') as f:
    #     f.write(video_data)

尾语

感谢你观看我的文章呐~本次航班到这里就结束啦 🛬

希望本篇文章有对你带来帮助 🎉,有学习到一点知识~

躲起来的星星🍥也在努力发光,你也要努力加油(让我们一起努力叭)。

最后,宣传一下呀~👇👇👇更多源码、资料、素材、解答、交流皆点击下方名片获取呀👇👇

文章来源:https://blog.csdn.net/m0_72282564/article/details/135697911
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。