喜歡玩王者榮耀的有福了,用 Python 獲取英雄皮膚壁紙
出品:Python數(shù)據(jù)之道 (ID:PyDataLab)
作者:葉庭云,來自讀者投稿
編輯:Lemon
一、前言
王者榮耀這款手游,想必大家都玩過或聽過,游戲里英雄有各式各樣的皮膚,制作得很精美,有些拿來做電腦壁紙它不香嗎。本文帶你利用 Python 爬蟲一鍵下載王者榮耀英雄皮膚壁紙。
1. 目標(biāo)
創(chuàng)建一個(gè)文件夾, 里面又有按英雄名稱分的子文件夾保存該英雄的所有皮膚圖片。
URL:https://pvp.qq.com/web201605/herolist.shtml
2. 環(huán)境
運(yùn)行環(huán)境:Pycharm、Python3.7
需要的庫
import?requests
import?os
import?json
from?lxml?import?etree
from?fake_useragent?import?UserAgent
import?logging
二、分析網(wǎng)頁
首先打開王者榮耀官網(wǎng),點(diǎn)擊英雄資料進(jìn)去。

進(jìn)入新的頁面后,任意選擇一個(gè)英雄,檢查網(wǎng)頁。

多選擇幾個(gè)英雄檢查網(wǎng)頁,可以發(fā)現(xiàn)各個(gè)英雄頁面的 URL 規(guī)律
https://pvp.qq.com/web201605/herodetail/152.shtml
https://pvp.qq.com/web201605/herodetail/150.shtml
https://pvp.qq.com/web201605/herodetail/167.shtml
發(fā)現(xiàn)只有末尾的數(shù)字在變化,末尾的數(shù)字可以認(rèn)為是該英雄的頁面標(biāo)識(shí)。
點(diǎn)擊 Network,Crtl + R 刷新,可以找到一個(gè) herolist.json 文件。

發(fā)現(xiàn)是亂碼,但問題不大,雙擊這個(gè) json 文件,將它下載下來觀察,用編輯器打開可以看到。

ename 是英雄網(wǎng)址頁面的標(biāo)識(shí);而 cname 是對(duì)應(yīng)英雄的名稱;skin_name 為對(duì)應(yīng)皮膚的名稱。
任選一個(gè)英雄頁面進(jìn)去,檢查該英雄下面所有皮膚,觀察 url 變化規(guī)律。

url變化規(guī)律如下:
https://game.gtimg.cn/images/yxzj/img201606/heroimg/152/152-bigskin-1.jpg
https://game.gtimg.cn/images/yxzj/img201606/heroimg/152/152-bigskin-2.jpg
https://game.gtimg.cn/images/yxzj/img201606/heroimg/152/152-bigskin-3.jpg
https://game.gtimg.cn/images/yxzj/img201606/heroimg/152/152-bigskin-4.jpg
https://game.gtimg.cn/images/yxzj/img201606/heroimg/152/152-bigskin-5.jpg
復(fù)制圖片鏈接到瀏覽器打開,可以看到高清大圖。

觀察到同一個(gè)英雄的皮膚圖片 url 末尾 -{x}.jpg 從 1 開始依次遞增,再來看看不同英雄的皮膚圖片 url 是如何構(gòu)造的。會(huì)發(fā)現(xiàn), ename 這個(gè)英雄的標(biāo)識(shí)不一樣,獲取到的圖片就不一樣,由 ename 參數(shù)決定。
https://game.gtimg.cn/images/yxzj/img201606/heroimg/152/152-bigskin-1.jpg
https://game.gtimg.cn/images/yxzj/img201606/heroimg/150/150-bigskin-1.jpg
https://game.gtimg.cn/images/yxzj/img201606/heroimg/153/153-bigskin-1.jpg
#?可構(gòu)造圖片請(qǐng)求鏈接如下
https://game.gtimg.cn/images/yxzj/img201606/heroimg/{ename}/{ename}-bigskin-{x}.jpg
三、爬蟲代碼實(shí)現(xiàn)
#?-*-?coding:?UTF-8?-*-
"""
@File ???:王者榮耀英雄皮膚壁紙.py
@Author ?:葉庭云
@Date ???:2020/10/2 11:40
@CSDN ???:https://blog.csdn.net/fyfugoyfa
"""
import?requests
import?os
import?json
from?lxml?import?etree
from?fake_useragent?import?UserAgent
import?logging
#?日志輸出的基本配置
logging.basicConfig(level=logging.INFO,?format='%(asctime)s?-?%(levelname)s:?%(message)s')
class?glory_of_king(object):
????def?__init__(self):
????????if?not?os.path.exists("./王者榮耀皮膚"):
????????????os.mkdir("王者榮耀皮膚")
????????#?利用fake_useragent產(chǎn)生隨機(jī)UserAgent??防止被反爬
????????ua?=?UserAgent(verify_ssl=False,?path='fake_useragent.json')
????????for?i?in?range(1,?50):
????????????self.headers?=?{
????????????????'User-Agent':?ua.random
????????????}
????def?scrape_skin(self):
????????#?發(fā)送請(qǐng)求???獲取響應(yīng)
????????response?=?requests.get('https://pvp.qq.com/web201605/js/herolist.json',?headers=self.headers)
????????#?str轉(zhuǎn)為json
????????data?=?json.loads(response.text)
????????#?for循環(huán)遍歷data獲取需要的字段??創(chuàng)建對(duì)應(yīng)英雄名稱的文件夾
????????for?i?in?data:
????????????hero_number?=?i['ename']????#?獲取英雄名字編號(hào)
????????????hero_name?=?i['cname']??????#?獲取英雄名字
????????????os.mkdir("./王者榮耀皮膚/{}".format(hero_name))??#?創(chuàng)建英雄名稱對(duì)應(yīng)的文件夾
????????????response_src?=?requests.get("https://pvp.qq.com/web201605/herodetail/{}.shtml".format(hero_number),
????????????????????????????????????????headers=self.headers)
????????????hero_content?=?response_src.content.decode('gbk')??#?返回相應(yīng)的html頁面?解碼為gbk
????????????#?xpath解析對(duì)象??提取每個(gè)英雄的皮膚名字
????????????hero_data?=?etree.HTML(hero_content)
????????????hero_img?=?hero_data.xpath('//div[@class="pic-pf"]/ul/@data-imgname')
????????????#?去掉每個(gè)皮膚名字中間的分隔符
????????????hero_src?=?hero_img[0].split('|')
????????????logging.info(hero_src)
????????????#?遍歷英雄src處理圖片名稱。
????????????for?j?in?range(len(hero_src)):
????????????????#?去掉皮膚名字的&符號(hào)
????????????????index_?=?hero_src[j].find("&")
????????????????skin_name?=?hero_src[j][:index_]
????????????????#?請(qǐng)求下載圖片
????????????????response_skin?=?requests.get(
????????????????????"https://game.gtimg.cn/images/yxzj/img201606/skin/hero-info/{}/{}-bigskin-{}.jpg".format(
????????????????????????hero_number,?hero_number,?j?+?1))
????????????????#?獲取圖片二進(jìn)制數(shù)據(jù)????????
????????????????skin_img?=?response_skin.content??
????????????????#?把皮膚圖片保存到對(duì)應(yīng)名字的文件里
????????????????with?open("./王者榮耀皮膚/{}/{}.jpg".format(hero_name,?skin_name),?"wb")as?f:
????????????????????f.write(skin_img)
????????????????????logging.info(f"{skin_name}.jpg 下載成功??!")
????def?run(self):
????????self.scrape_skin()
if?__name__?==?'__main__':
????spider?=?glory_of_king()
????spider.run()
運(yùn)行效果如下:

程序運(yùn)行一段時(shí)間,英雄皮膚壁紙就都保存在本地文件夾啦,結(jié)果如下:

四、其他說明
不建議抓取太多數(shù)據(jù),容易對(duì)服務(wù)器造成負(fù)載,淺嘗輒止即可。 通過本文爬蟲,可以幫助你了解 json 數(shù)據(jù)的解析和提取需要的數(shù)據(jù),如何通過字符串的拼接來構(gòu)造URL請(qǐng)求。 本文利用 Python 爬蟲一鍵下載王者榮耀英雄皮膚壁紙,實(shí)現(xiàn)過程中也會(huì)遇到一些問題,多思考和調(diào)試,最終解決問題,也能理解得更深刻。 代碼可直接復(fù)制運(yùn)行,如果覺得還不錯(cuò),記得給個(gè)贊哦,也是對(duì)作者最大的鼓勵(lì),不足之處可以在評(píng)論區(qū)多多指正。
解決報(bào)錯(cuò):fake_useragent.errors.FakeUserAgentError: Maximum amount of retries reached
#?報(bào)錯(cuò)如下
Error?occurred?during?loading?data.?Trying?to?use?cache?server?https://fake-useragent.herokuapp.com/browsers/0.1.11
Traceback?(most?recent?call?last):
??File?"/usr/local/python3/lib/python3.6/urllib/request.py",?line?1318,?in?do_open
????encode_chunked=req.has_header('Transfer-encoding'))
??File?"/usr/local/python3/lib/python3.6/http/client.py",?line?1239,?in?request
????self._send_request(method,?url,?body,?headers,?encode_chunked)
??File?"/usr/local/python3/lib/python3.6/http/client.py",?line?1285,?in?_send_request
????self.endheaders(body,?encode_chunked=encode_chunked)
??File?"/usr/local/python3/lib/python3.6/http/client.py",?line?1234,?in?endheaders
????self._send_output(message_body,?encode_chunked=encode_chunked)
??File?"/usr/local/python3/lib/python3.6/http/client.py",?line?1026,?in?_send_output
????self.send(msg)
??File?"/usr/local/python3/lib/python3.6/http/client.py",?line?964,?in?send
????self.connect()
??File?"/usr/local/python3/lib/python3.6/http/client.py",?line?1392,?in?connect
????super().connect()
??File?"/usr/local/python3/lib/python3.6/http/client.py",?line?936,?in?connect
????(self.host,self.port),?self.timeout,?self.source_address)
??File?"/usr/local/python3/lib/python3.6/socket.py",?line?724,?in?create_connection
????raise?err
??File?"/usr/local/python3/lib/python3.6/socket.py",?line?713,?in?create_connection
????sock.connect(sa)
socket.timeout:?timed?out
?
During?handling?of?the?above?exception,?another?exception?occurred:
?
Traceback?(most?recent?call?last):
??File?"/usr/local/python3/lib/python3.6/site-packages/fake_useragent/utils.py",?line?67,?in?get
????context=context,
??File?"/usr/local/python3/lib/python3.6/urllib/request.py",?line?223,?in?urlopen
????return?opener.open(url,?data,?timeout)
??File?"/usr/local/python3/lib/python3.6/urllib/request.py",?line?526,?in?open
????response?=?self._open(req,?data)
??File?"/usr/local/python3/lib/python3.6/urllib/request.py",?line?544,?in?_open
????'_open',?req)
??File?"/usr/local/python3/lib/python3.6/urllib/request.py",?line?504,?in?_call_chain
????result?=?func(*args)
??File?"/usr/local/python3/lib/python3.6/urllib/request.py",?line?1361,?in?https_open
????context=self._context,?check_hostname=self._check_hostname)
??File?"/usr/local/python3/lib/python3.6/urllib/request.py",?line?1320,?in?do_open
????raise?URLError(err)
urllib.error.URLError:?
?
During?handling?of?the?above?exception,?another?exception?occurred:
?
Traceback?(most?recent?call?last):
??File?"/usr/local/python3/lib/python3.6/site-packages/fake_useragent/utils.py",?line?154,?in?load
????for?item?in?get_browsers(verify_ssl=verify_ssl):
??File?"/usr/local/python3/lib/python3.6/site-packages/fake_useragent/utils.py",?line?97,?in?get_browsers
????html?=?get(settings.BROWSERS_STATS_PAGE,?verify_ssl=verify_ssl)
??File?"/usr/local/python3/lib/python3.6/site-packages/fake_useragent/utils.py",?line?84,?in?get
????raise?FakeUserAgentError('Maximum?amount?of?retries?reached')
fake_useragent.errors.FakeUserAgentError:?Maximum?amount?of?retries?reached
解決方法如下:
#?將 https://fake-useragent.herokuapp.com/browsers/0.1.11 里內(nèi)容復(fù)制?并另存為本地 json 文件:fake_useragent.json
#?引用
ua?=?UserAgent(verify_ssl=False,?path='fake_useragent.json')
print(ua.random)
運(yùn)行結(jié)果如下:
Mozilla/5.0?(Windows?NT?6.2;?WOW64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/27.0.1500.55?Safari/537.36戀習(xí)Python 關(guān)注戀習(xí)Python,Python都好練
好文章,我在看??
