丁香人人六月综合查询,精品人妻一区二区三区在,va色婷婷亚洲在线,日本国产在线观看,天天插插综合视频综合,国产精品国产三级国产,乱论网站,亚州精品成人片

飛花令是古時候人們經(jīng)常玩一種“行酒令”的游戲，是中國古代酒令之一，屬雅令?！帮w花”一詞則出自唐代詩人韓翃《寒食》中 春城無處不飛花 一句。行飛花令時選用詩和詞，也可用曲，但選擇的句子一般不超過7個字。

在《中國詩詞大會》中改良了“飛花令”，不再僅用花字，而是增加了 云、春、月、夜 等詩詞中的高頻字，輪流背誦含有關(guān)鍵字的詩句，直至決出勝負。

今天，我們就利用 Python 定制一款“飛花令”小程序：給定一個關(guān)鍵字或者關(guān)鍵詞，就能夠返回許多含有這個關(guān)鍵字的詩句，跟朋友玩再也不怕輸了！

網(wǎng)頁分析

要利用爬蟲完成這項工作需要先選擇一個合適的網(wǎng)站，這里我們選擇了 古詩文網(wǎng)https://www.gushiwen.cn/

在右上角的方框中輸入關(guān)鍵詞，如酒，就能夠返回相應(yīng)的結(jié)果：

我們注意到，返回的結(jié)果是一整首詩或詞，關(guān)鍵字所在的句子僅為其中一句。后面我們爬取信息時也需要做到過濾。

往下翻頁后會發(fā)現(xiàn)只能獲取前 2 頁內(nèi)容，到第 3 頁會出現(xiàn)以下提示：

也就是說要完整獲取全部詩文需要下載 App，本文簡化問題只爬取前 2 頁的內(nèi)容，后續(xù)有機會再分享 App 相關(guān)爬蟲推文。在翻頁的過程中我們注意一下 URL 的改變：

“
第 1 頁：https://so.gushiwen.cn/search.aspx?value=酒
第 2 頁：https://so.gushiwen.cn/search.aspx?type=title&page=2&value=酒
”

其中經(jīng)過測試 type=title 可以去除，而page=2 顯然是頁碼，那么 page=1 能否獲取到第 1 頁呢？

答案是可以的，因此不需要用 requests 的 post 請求，直接 get 下面的 URL 就可到達指定頁面：https://so.gushiwen.cn/search.aspx?page=頁碼&value=關(guān)鍵字

大致分析完就可以寫代碼了

代碼實現(xiàn)

首先導(dǎo)入庫，設(shè)置請求頭

import?requests
from?lxml?import?html

headers?=?{'user-agent':?'Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/74.0.3729.169?Safari/537.36'}

以關(guān)鍵字酒為例，嘗試獲取第一頁全部內(nèi)容：

import?requests
from?lxml?import?html

headers?=?{'user-agent':?'Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/74.0.3729.169?Safari/537.36'}
html_data?=?requests.get('https://so.gushiwen.cn/search.aspx?page=1&value=酒',?headers=headers).text
print(html_data)

返回的文本中有我們需要的內(nèi)容，說明組合而成的請求是沒有問題的。接下來就可以解析文本獲取具體內(nèi)容了，本文采用 Xpath：

selector?=?html.fromstring(html_data)
poets?=?selector.xpath("/html/body/div[2]/div[1]/div[@class='sons']")
for?poet?in?poets:
????title?=?''.join(poet.xpath("div[1]/p[1]/a/b//text()")).strip()
????print(title)

詩人和朝代被分隔至兩行，說明之間存在換行符及空格，可以用包含.strip()的列表推導(dǎo)式去除：

for?poet?in?poets:
????title?=?''.join(poet.xpath("div[1]/p[1]/a/b//text()")).strip()
????source?=?''.join(poet.xpath('div[1]/p[2]//text()'))
????source?=?''.join([i.strip()?for?i?in?source])
????print(title,?source)

最后是對詩句的解析。為了獲取關(guān)鍵字真正在的句子，我們要通過句號或者問號將整首詩斷開成多個完整句：

for?poet?in?poets:
????title?=?''.join(poet.xpath("div[1]/p[1]/a/b//text()")).strip()
????source?=?''.join(poet.xpath('div[1]/p[2]//text()'))
????source?=?''.join([i.strip()?for?i?in?source])
????contents?=?''.join(poet.xpath('div[1]/div[@class="contson"]//text()')).strip().replace('\n',?'。').replace('？',?'。').split('。')
????print(title,?source,?contents)

對每一首詩逐漸判斷是否包含關(guān)鍵字：

for?poet?in?poets:
????title?=?''.join(poet.xpath("div[1]/p[1]/a/b//text()")).strip()
????source?=?''.join(poet.xpath('div[1]/p[2]//text()'))
????source?=?''.join([i.strip()?for?i?in?source])
????contents?=?''.join(poet.xpath('div[1]/div[@class="contson"]//text()')).strip().replace('\n',?'。').replace('？',?'。').split('。')
????content_lst?=?[]
????for?i?in?contents:
????????if?'酒'?in?i:
????????????content?=?i.strip()?+?'。'
????????????content_lst.append(content)
????????????#?有的詩可能有兩句都包含關(guān)鍵字，這兩句詩就都是需求
????if?not?content_lst:?#?有可能只有題目中含有關(guān)鍵詞，這種詩就跳過
????????continue
????for?j?in?list(set(content_lst)):?#?有可能有的詩雖然有兩句都包含關(guān)鍵字，但這兩句是一樣的，需要去重
????????print(j,?title,?source)

大部分需求已經(jīng)滿足，最后只需要利用循環(huán)結(jié)構(gòu)組裝 URL 達到范圍多頁的目的，同時關(guān)鍵字可以修改為 input 交互輸入，代碼如下：

import?requests
from?lxml?import?html

headers?=?{'user-agent':?'Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/74.0.3729.169?Safari/537.36'}

def?poet_content(keyword,num,url):
????html_data?=?requests.get(url,?headers=headers).text
????selector?=?html.fromstring(html_data)
????poets?=?selector.xpath("/html/body/div[2]/div[1]/div[@class='sons']")
????for?poet?in?poets:
????????title?=?''.join(poet.xpath("div[1]/p[1]/a/b//text()")).strip()
????????source?=?''.join(poet.xpath('div[1]/p[2]//text()'))
????????source?=?''.join([i.strip()?for?i?in?source])
????????contents?=?''.join(poet.xpath('div[1]/div[@class="contson"]//text()')).strip().replace('\n',?'。').replace('？','。').split('。')
????????content_lst?=?[]
????????for?i?in?contents:
????????????if?keyword?in?i:
????????????????content?=?i.strip()?+?'。'
????????????????content_lst.append(content)
????????if?not?content_lst:
????????????continue
????????for?j?in?list(set(content_lst)):
????????????print(num,?j)
????????????print(f'<{title}>',?source)
????????????print('')
????????????num?+=?1
????return?num

if?__name__?==?'__main__':
????keyword?=?input('>?請輸入關(guān)鍵詞:?')
????print('')
????num?=?1
????for?i?in?range(1,?3):
????????url?=?f'https://so.gushiwen.org/search.aspx?page={i}&value={keyword}'
????????num?=?poet_content(keyword,?num,?url)

至此，我們就通過 Python 爬蟲就成功制作了一款“飛花令”小工具，感興趣的讀者可以自己嘗試一下！

超詳細，手把手教你用20行Python代碼制作飛花令小程序！

網(wǎng)頁分析

代碼實現(xiàn)

超詳細，手把手教你用20行Python代碼制作飛花令小程序！