大家沉迷短視頻無(wú)法自拔?Python爬蟲(chóng)進(jìn)階,帶你玩轉(zhuǎn)短視頻
現(xiàn)在短視頻可謂是一騎絕塵,吃飯的時(shí)候、休息的時(shí)候、躺在床上都在刷短視頻,今天給大家?guī)?lái)python爬蟲(chóng)進(jìn)階 :美拍視頻地址加密解析。
短視頻js逆向解析
抓取目標(biāo)
目標(biāo)網(wǎng)址:
工具使用
開(kāi)發(fā)環(huán)境:win10、python3.7 開(kāi)發(fā)工具:pycharm、Chrome 工具包:requests、xpath、base64
重點(diǎn)學(xué)習(xí)內(nèi)容
爬蟲(chóng)采集數(shù)據(jù)的解析過(guò)程 js代碼調(diào)試技巧 js逆向解析代碼 Python代碼的轉(zhuǎn)換
項(xiàng)目思路解析
進(jìn)入到網(wǎng)站的首頁(yè) 挑選你感興趣的分類(lèi) 根據(jù)首頁(yè)地址獲取到進(jìn)入詳情頁(yè)面的超鏈接的跳轉(zhuǎn)地址
找到對(duì)應(yīng)加密的視頻播放地址數(shù)據(jù)
這個(gè)數(shù)據(jù)是靜態(tài)的網(wǎng)頁(yè)數(shù)據(jù),通過(guò)js代碼進(jìn)行解碼的 找到對(duì)應(yīng)的解析代碼 先找到視頻的播放地址 找到解析視頻地址的加密js文件 點(diǎn)擊播放的時(shí)候會(huì)觸發(fā)文件
大致能看出來(lái)這個(gè)是base64加密之后的數(shù)據(jù) 在對(duì)應(yīng)的js文件里搜索關(guān)鍵字 找到j(luò)s的加密方式
js函數(shù)的一些函數(shù)的用法
????#?eplace()方法用于在字符串中用一些字符替換另一些字符
????#?parseInt?數(shù)據(jù)轉(zhuǎn)換成對(duì)應(yīng)的整型
????#?base64.atob???對(duì)base64編碼過(guò)的字符串進(jìn)行解碼
????#?substring?方法可在字符串中抽取從?start?下標(biāo)開(kāi)始的指定數(shù)目的字符
將js代碼轉(zhuǎn)換成Python代碼
import?base64
def?decode(data):
????def?getHex(a):
????????return?{
????????????'str':?a[4:],
????????????'hex':?''.join(list(a[:4])[::-1]),
????????}
????def?getDec(a):
????????b?=?str(int(a,?16))
????????return?{
????????????'pre':?list(b[:2]),
????????????'tail':?list(b[2:]),
????????}
????def?substr(a,?b):
????????c?=?a[0:?int(b[0])]
????????d?=?a[int(b[0]):?int(b[0])?+?int(b[1])]
????????return?c?+?a[int(b[0]):].replace(d,?"")
????def?getPos(a,?b):
????????b[0]?=?len(a)?-?int(b[0])?-?int(b[1])
????????return?b
????b?=?getHex(data)
????c?=?getDec(b['hex'])
????d?=?substr(b['str'],?c['pre'])
????return?base64.b64decode(substr(d,?getPos(d,?c['tail'])))
print(decode("e121Ly9tBrI84RdnZpZGVvMTAubWVpdHVkYXRhLmNvbS82MGJjZDcwNTE3NGZieXBueG5udnRwMTA5N19IMjY0XzFfNWY3YThmM2U0MTEwNy5tc2JVjAu3EDQ="))
得出最終視頻播放地址

簡(jiǎn)易源碼分享
import?requests
from?lxml?import?etree
import?base64
def?decode_mp4(data):
????def?getHex(a):
????????return?{
????????????'str':?a[4:],
????????????'hex':?''.join(list(a[:4])[::-1]),
????????}
????def?getDec(a):
????????b?=?str(int(a,?16))
????????return?{
????????????'pre':?list(b[:2]),
????????????'tail':?list(b[2:]),
????????}
????def?substr(a,?b):
????????c?=?a[0:?int(b[0])]
????????d?=?a[int(b[0]):?int(b[0])?+?int(b[1])]
????????return?c?+?a[int(b[0]):].replace(d,?"")
????def?getPos(a,?b):
????????b[0]?=?len(a)?-?int(b[0])?-?int(b[1])
????????return?b
????b?=?getHex(data)
????c?=?getDec(b['hex'])
????d?=?substr(b['str'],?c['pre'])
????return?base64.b64decode(substr(d,?getPos(d,?c['tail'])))
#?運(yùn)行主函數(shù)
def?main():
????url?=?'https://www.meipai.com'
????headers?=?{
????????'User-Agent':?'Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/75.0.3770.142?Safari/537.36',
????}
????response?=?requests.get(url=url,?headers=headers)
????html_data?=?etree.HTML(response.text)
????href_list?=?html_data.xpath('//div/a/@href')
????#?print(href_list)
????for?href?in?href_list:
????????res?=?requests.get('https://www.meipai.com'?+?href,?headers=headers)
????????html?=?etree.HTML(res.text)
????????name?=?html.xpath('//div[@id="detailVideo"]/img/@alt')[0]
????????mp4_data?=?html.xpath('//div[@id="detailVideo"]/@data-video')[0]
????????#?print(name,?mp4_data)
????????mp4_url?=?decode_mp4(mp4_data).decode('utf-8')
????????print(mp4_url)
????????result?=?requests.get("http:"?+?mp4_url)
????????with?open(name?+?".mp4",?'wb')?as?f:
????????????f.write(result.content)
????????????f.close()
if?__name__?==?'__main__':
????main()
歡迎大家在評(píng)論中交流技術(shù),記得一鍵三連哦,祝大家順順利利開(kāi)開(kāi)心心!
