《演員請(qǐng)就位2》彈幕的情感傾向分析

文 |?某某白米飯
來(lái)源:Python 技術(shù)「ID: pythonall」

最近小編的娛樂(lè)公眾號(hào)被《演員請(qǐng)就位2》刷屏了,這部綜藝的從開(kāi)播開(kāi)始導(dǎo)演的熱搜話(huà)題就一直不斷,我們用 Python 分析一下這部綜藝的視頻彈幕看看大家都在吐糟些什么。
彈幕抓取
在騰訊視頻打開(kāi)最新的第 8 期的上下兩期,在 Network 面板中搜索【danmu】,找到彈幕的鏈接 (https://mfm.video.qq.com/danmu?otype=json....)

分析其中的請(qǐng)求參數(shù)可以發(fā)現(xiàn)只有 timestamp 參數(shù)在以每次 30 的數(shù)字遞增,盲猜一波應(yīng)該是視頻每 30 秒獲取一次彈幕包,其他的請(qǐng)求參數(shù)可以保持不變
import?csv
import?requests
import?json
import?time
from?pathlib?import?Path
def?danmu():
????headers?=?{
????????'User-Agent':?'Mozilla/5.0?(Macintosh;?Intel?Mac?OS?X?10_15_7)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/86.0.4240.80?Safari/537.36'
????}
????#?彈幕鏈接,視頻長(zhǎng)度(秒)
????urls?=?[['https://mfm.video.qq.com/danmu?otype=json&callback=&target_id=6208914107%26vid%3Do0035t7199o&session_key=63761%2C673%2C1606144955×tamp={}&_=1606144949402',?7478],
????????????['https://mfm.video.qq.com/danmu?otype=json&callback=&target_id=6208234802%26vid%3Da00352eyo25&session_key=111028%2C1191%2C1606200649×tamp={}&_=1606200643186',?8610]]
????for?url?in?urls:
????????for?page?in?range(15,?url[1],?30):
????????????u?=?url[0].format(page)
????????????html?=?requests.get(u,?headers=headers)
????????????result?=?json.loads(html.text,?strict=False)
????????????time.sleep(1)
????????????danmu_list?=?[]
????????????#?遍歷獲取目標(biāo)字段
????????????for?i?in?result['comments']:
????????????????content?=?i['content']??#?彈幕內(nèi)容
????????????????danmu_list.append([content])
????????????????print(content)
????????????csv_write(danmu_list)
def?csv_write(tablelist):
????tableheader?=?['彈幕內(nèi)容']
????csv_file?=?Path('danmu.csv')
????not_file?=?not?csv_file.is_file()
????with?open('danmu.csv',?'a',?newline='',?errors='ignore')?as?f:
????????writer?=?csv.writer(f)
????????if?not_file:
????????????writer.writerow(tableheader)
????????for?row?in?tablelist:
????????????writer.writerow(row)
抓到了 7W+ 的彈幕,文件為 3M 大小

情感分析
抓取到彈幕后,用騰訊云的情感分析 API 分析彈幕的情感傾向是正面的還是負(fù)面的亦或是中性情感
參考騰訊云 https://cloud.tencent.com/document/sdk/Python 頁(yè)面獲取 SecretId 和 SecretKey 安全憑證,用 pip install tencentcloud-sdk-python 安裝騰訊云的 SDK,遇到證書(shū)錯(cuò)誤時(shí)用 sudo "/Applications/Python 3.6/Install Certificates.command" 命令安裝證書(shū)

from?tencentcloud.common?import?credential
from?tencentcloud.common.profile.client_profile?import?ClientProfile
from?tencentcloud.common.profile.http_profile?import?HttpProfile
from?tencentcloud.common.exception.tencent_cloud_sdk_exception?import?TencentCloudSDKException
from?tencentcloud.nlp.v20190408?import?nlp_client,?models
import?ssl
ssl._create_default_https_context=ssl._create_unverified_context
def?nlp(text):
????try:
????????cred?=?credential.Credential("xxx",?"xxx")
????????httpProfile?=?HttpProfile()
????????httpProfile.endpoint?=?"nlp.tencentcloudapi.com"
????????clientProfile?=?ClientProfile()
????????clientProfile.httpProfile?=?httpProfile
????????client?=?nlp_client.NlpClient(cred,?"ap-guangzhou",?clientProfile)
????????req?=?models.SentimentAnalysisRequest()
????????params?=?{
????????????"Text":?text,
????????????"Mode":?"3class"
????????}
????????req.from_json_string(json.dumps(params))
????????resp?=?client.SentimentAnalysis(req)
????????sentiment?=?{'positive':?'正面',?'negative':?'負(fù)面',?'neutral':?'中性'}
????????return?sentiment[resp.Sentiment]
????except?TencentCloudSDKException?as?err:
????????print(err)
示例結(jié)果

導(dǎo)演好感度
對(duì)于頻頻上熱搜的導(dǎo)演們觀(guān)眾對(duì)他們的感官是怎么樣的,將情感分析結(jié)果轉(zhuǎn)換成大家對(duì)各個(gè)導(dǎo)演評(píng)價(jià)的百分比,并用 pyecharts 制作成圖表

彈幕中對(duì)趙薇的負(fù)面評(píng)價(jià)達(dá)到 30%,爾冬升、趙薇、郭敬明的正面評(píng)價(jià)都差不多在 46% 左右,主持人大鵬的正面評(píng)價(jià)居然是最高的,達(dá)到 59%,趙薇的彈幕量最多、陳凱歌彈幕數(shù)量是第二個(gè),爾冬升的彈幕量不到 2000
彈幕詞云
將彈幕詞云化,看看大家都在吐槽寫(xiě)什么

第一眼就看到了的秋褲兩個(gè)字
def?ciyun():
????with?open('danmu.csv')?as?f:
????????with?open('ciyun.txt',?'a')?as?ciyun_file:
????????????csv_reader?=?csv.reader(f)
????????????for?row?in?csv_reader:
????????????????ciyun_file.write(row[0])
????#?構(gòu)建并配置詞云對(duì)象w
????w?=?wordcloud.WordCloud(width=1000,
????????????????????????????height=700,
????????????????????????????background_color='white',
????????????????????????????font_path="/System/Library/fonts/PingFang.ttc",
????????????????????????????collocations=False,
????????????????????????????stopwords={'的',?'了','啊','我','很','是','好','這','都','不'})
????
????f?=?open('ciyun.txt',?encoding='utf-8')
????txt?=?f.read()
????txtlist?=?jieba.lcut(txt)
????result?=?"?".join(txtlist)
????
????w.generate(result)
????w.to_file('演員請(qǐng)就位2.png')
總結(jié)
騰訊視頻彈幕的抓取比較簡(jiǎn)單,每隔 30 秒發(fā)送一次請(qǐng)求獲取彈幕包。有興趣的朋友可以嘗試其他視頻網(wǎng)站的彈幕抓取,一起努力進(jìn)步天天向上。
PS:公號(hào)內(nèi)回復(fù)「Python」即可進(jìn)入Python 新手學(xué)習(xí)交流群,一起 100 天計(jì)劃!
老規(guī)矩,兄弟們還記得么,右下角的 “在看” 點(diǎn)一下,如果感覺(jué)文章內(nèi)容不錯(cuò)的話(huà),記得分享朋友圈讓更多的人知道!


【代碼獲取方式】
