【B 站視頻教程】抓取用戶微博和批量抓取評論
? ? 點(diǎn)擊上方?月小水長?并?設(shè)為星標(biāo),第一時間接收干貨推送
如何抓取用戶的所有微博,該部分代碼地址在:?一個爬取用戶所有微博的爬蟲,還能斷網(wǎng)續(xù)爬那種(點(diǎn)擊直達(dá)),下面的視頻詳情演示了這個過程
{"cookie": "換成你的 cookie","comments": [{"mid": "KCXTUah9W","uid": "2656274875","limit": 100000,"decs": "吳京說神州十三號太美了"},{"mid": "KCYA7jubh","uid": "2803301701","limit": 100000,"decs": "吳京說神州十三號太美了"}]}
# -*- coding: utf-8 -*-# author: inspurer(月小水長)# create_time: 2021/10/17 10:31# 運(yùn)行環(huán)境 Python3.6+# github https://github.com/inspurer# 微信公眾號 月小水長import jsonimport pandas as pdlimit = 10000config_path = 'mac_comment_config.json'data_path = './topic/小米.csv'def drop_duplicate(path, col_index=0):df = pd.read_csv(path)first_column = df.columns.tolist()[col_index]# 去除重復(fù)行數(shù)據(jù)df.drop_duplicates(keep='first', inplace=True, subset=[first_column])# 可能還剩下重復(fù) headerdf = df[-df[first_column].isin([first_column])]df.to_csv(path, encoding='utf-8-sig', index=False)drop_duplicate(data_path)with open(config_path, 'r', encoding='utf-8-sig') as f:config_json = json.loads(f.read())df = pd.read_csv(data_path)# 清楚原有的 comments 配置,如不需要可注釋config_json['comments'].clear()for index, row in df.iterrows():print(f'{index + 1}/{df.shape[0]}')weibo_link = row['weibo_link']if '?' in weibo_link:weibo_link = weibo_link[:weibo_link.index('?')]uid = weibo_link[weibo_link.index('com') + 4:weibo_link.rindex('/')]mid = weibo_link[weibo_link.rindex('/') + 1:]config_json['comments'].append({'mid': mid,'uid': uid,'limit': limit,'desc': row['user_name']})with open(config_path, 'w', encoding='utf-8-sig') as f:f.write(json.dumps(config_json, indent=2, ensure_ascii=False))
評論
圖片
表情
