wwwav在线,性爱资源网,看片一区,免费看A片视频,selaoban亚洲精品一区,我想看中国台湾特级黄色录像1级特黄特黄的 ,91香蕉视频在线观看免费,啪啪网站在线观看

點(diǎn)擊上方月小水長并設(shè)為星標(biāo)，第一時間接收干貨推送

這是月小水長的第 81 篇原創(chuàng)干貨

目前公眾號平臺改變了推送機(jī)制，點(diǎn)“贊”、點(diǎn)“在看”、添加過“星標(biāo)”的同學(xué)，都會優(yōu)先接收到我的文章推送，所以大家讀完文章后，記得點(diǎn)一下“在看”和“贊”。

今天更新的是微博用戶信息爬蟲，不是用戶爬蟲，用戶爬蟲爬的用戶主頁發(fā)過的微博，用戶爬蟲用 cn 站的還可以用一個爬取用戶所有微博的爬蟲，還能斷網(wǎng)續(xù)爬那種；而微博用戶信息爬蟲指的是，根據(jù)微博用戶 id，抓取用戶的陽光信用、性別、地區(qū)、學(xué)校、公司等信息。

代碼全部開源在 WeiboSuperSpider 的 github 倉庫地址，功能獨(dú)立版文件夾下，取名 WeiboUserInfoSpider，

https://github.com/Python3Spiders/WeiboSuperSpider

或者點(diǎn)擊文末的閱讀原文直達(dá)源代碼文件。

拿到代碼后，需要填一下 headers 里面的 cookie，隨便打開 weibo.com 站點(diǎn)里一個人的主頁，比如

https://weibo.com/u/1764201374

也可以是

https://weibo.com/xiena

這種形式，一般比較大咖的的人的純數(shù)字 uid 都被解析成數(shù)字+字母形式的 uid 了。

然后 F12 開始找 info 或者 detail 這兩個 path 之一，復(fù)制它們的 cookie 即可。

然后就可以運(yùn)行這份代碼了。

核心代碼是根據(jù) uid 獲取 userinfo 信息，如下

def getUserInfo(uid):    try:        uid = int(uid)    except:        # 說明是 xiena 這樣的英文串        uid = parseUid(uid)        if not uid:            return None    response = requests.get(url=f'https://weibo.com/ajax/profile/detail?uid={uid}', headers=headers)    resp_json = response.json().get('data', None)    if not resp_json:        return None    sunshine_credit = resp_json.get('sunshine_credit', None)    if sunshine_credit:        sunshine_credit_level = sunshine_credit.get('level', None)    else:        sunshine_credit_level = None    education = resp_json.get('education', None)    if education:        school = education.get('school', None)    else:        school = None
    location = resp_json.get('location', None)    gender = resp_json.get('gender', None)
    birthday = resp_json.get('birthday', None)    created_at = resp_json.get('created_at', None)    description = resp_json.get('description', None)    # 我關(guān)注的人中有多少人關(guān)注 ta    followers = resp_json.get('followers', None)    if followers:        followers_num = followers.get('total_number', None)    else:        followers_num = None    return {        'sunshine_credit_level': sunshine_credit_level,        'school': school,        'location': location,        'gender': gender,        'birthday': birthday,        'created_at': created_at,        'description': description,        'followers_num': followers_num    }

如果是 uid 是上面所說的第二種形式，不是純數(shù)字的，也會自動解析成數(shù)字形式的

def parseUid(uid):    response = requests.get(url=f'https://weibo.com/ajax/profile/info?custom={uid}', headers=headers)    try:        return response.json()['data']['user']['id']    except:        return None

這樣只是單獨(dú)獲取某一個 user 的 info，怎么批量獲取呢？比如我們利用 2021 新版微博評論及其子評論爬蟲發(fā)布爬取了某一條微博的評論，想要獲取這些評論者的所有 userinfo，分析它們的地區(qū)分布或者性別比例，下面的代碼就是干這個的

def dfAddUserInfo(file_path, user_col, user_info_col='user_info'):    '''    @params file_path 指定路徑    @params user_col 指定用戶主頁鏈接在那一列, 比如評論csv文件的是 comment_user_link    @params user_info_col 指定新加的 userinfo 列名，默認(rèn)是 user_info    '''    df = pd.read_csv(file_path)    user_info_init_value = 'init'    columns = df.columns.values.tolist()    if not user_info_col in columns:        df[user_info_col] = [user_info_init_value for _ in range(df.shape[0])]    for index, row in df.iterrows():        print(f'   {index+1}/{df.shape[0]}   ')        if not row.get(user_info_col, user_info_init_value) is user_info_init_value:            print('skip')            continue        user_link = row[user_col]        user_id = user_link[user_link.rindex('/')+1:]        user_info = getUserInfo(user_id)        print(user_info)        if user_info:            # 在 user_info 中統(tǒng)一為 user_link            user_info['user_link'] = user_link            df.loc[index, user_info_col] = json.dumps(user_info)            sleep(1)        else:            print(user_link)            break    df.to_csv(file_path, index=False, encoding='utf-8-sig')

這個函數(shù)會把新加的 user_info 字典以 json 形式加到原來的 csv 中，自動新增一列，列名默認(rèn)取名 user_info；

至于怎么在加了 user_info 的 csv 中遍歷想要的地區(qū)，性別，學(xué)校等信息，代碼也有舉例，本文的所有源代碼可以點(diǎn)擊閱讀原文直達(dá)。

超級方便的微博用戶信息爬蟲

點(diǎn)擊上方 月小水長 并 設(shè)為星標(biāo)，第一時間接收干貨推送

點(diǎn)擊上方月小水長并設(shè)為星標(biāo)，第一時間接收干貨推送