精品亚洲韩国,毛片成人网,亚洲AV影视网,日韩成人av影视,成人国产日韩AV网站,亚洲男人天堂2024,啊啊啊啊啊啊啊网站,青青草91青娱盛宴国产

點擊上方“Python爬蟲與數(shù)據(jù)挖掘”，進行關(guān)注

回復“書籍”即可獲贈Python從入門到進階共10本電子書

今

日

雞

湯

隨山將萬轉(zhuǎn)，趣途無百里。?

????大家好，我是Python進階者。

前言

? 前幾天【磐奚鳥】大佬在群里分享了一個抓取小說的代碼，感覺還是蠻不錯的，這里分享給大家學習。

一、小說下載

????如果你想下載該網(wǎng)站上的任意一本小說的話，直接點擊鏈接進去，如下圖所示。

????只要將URL中的這個數(shù)字拿到就可以了，比方說這里是951，那么這個數(shù)字代表的就是這本書的書號，在后面的代碼中可以用得到的。

二、具體實現(xiàn)

????這里直接丟大佬的代碼了，如下所示：

#?coding: utf-8'''筆趣網(wǎng)小說下載僅限用于研究代碼勿用于商業(yè)用途請于24小時內(nèi)刪除'''import requestsimport osfrom bs4 import BeautifulSoupimport time

def book_page_list(book_id):    '''    通過傳入的書號bookid，獲取此書的所有章節(jié)目錄    :param book_id:    :return: 章節(jié)目錄及章節(jié)地址    '''    url = 'http://www.biquw.com/book/{}/'.format(book_id)    headers = {        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36'}    response = requests.get(url, headers)    response.encoding = response.apparent_encoding    response = BeautifulSoup(response.text, 'lxml')    booklist = response.find('div', class_='book_list').find_all('a')    return booklist

def book_page_text(bookid, booklist):    '''    通過書號、章節(jié)目錄，抓取每一章的內(nèi)容并存檔    :param bookid:str    :param booklist:    :return:None    '''    try:        for book_page in booklist:            page_name = book_page.text.replace('*', '')            page_id = book_page['href']            time.sleep(3)            url = 'http://www.biquw.com/book/{}/{}'.format(bookid,page_id)            headers = {                'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36'}            response_book = requests.get(url, headers)            response_book.encoding = response_book.apparent_encoding            response_book = BeautifulSoup(response_book.text, 'lxml')            book_content = response_book.find('div', id="htmlContent")            with open("./{}/{}.txt".format(bookid,page_name), 'a') as f:                f.write(book_content.text.replace('\xa0', ''))                print("當前下載章節(jié)：{}".format(page_name))    except Exception as e:        print(e)        print("章節(jié)內(nèi)容獲取失敗，請確保書號正確，及書本有正常內(nèi)容。")

if __name__ == '__main__':    bookid = input("請輸入書號(數(shù)字)：")    # 如果書號對應的目錄不存在，則新建目錄，用于存放章節(jié)內(nèi)容    if not os.path.isdir('./{}'.format(bookid)):        os.mkdir('./{}'.format(bookid))    try:        booklist = book_page_list(bookid)        print("獲取目錄成功！")        time.sleep(5)        book_page_text(bookid, booklist)    except Exception as e:        print(e)????????print("獲取目錄失敗，請確保書號輸入正確！")