亚洲无码视频免费,亚州成人娱乐网,免费v片,欧美wwwxxx,日韩黄色免费视频,一级中国免费操逼,俺去也亚洲图区,麻豆三级

在使用python爬蟲爬取網(wǎng)頁(yè)時(shí)會(huì)遇到很多含有特殊符號(hào)的情況，當(dāng)把鏈接復(fù)制到瀏覽器打開，發(fā)現(xiàn)每個(gè)節(jié)點(diǎn)都多了個(gè)\，直接使用response.xpath()無法定位元素，為避免定位不到元素的問題，應(yīng)先對(duì)響應(yīng)內(nèi)容做一下過濾，然后使用response.replace()將過濾后的html文檔重新賦值給response，本文以爬取天貓店鋪商品鏈接為例，向大家介紹爬取過程。

爬取思路

1、使用response.text獲取html文本，去除其中的\；

2、使用response.replace() 重新將去除\后的html賦值給response；

3、使用response.xpath()定位元素，成功獲取商品鏈接。

具體代碼

# -*- coding: utf-8 -*-import reimport scrapy

class TmallSpider(scrapy.Spider):    name = 'tmall'    allowed_domains = ['tmall.com']    start_urls = [        'https://wogeliou.tmall.com/i/asynSearch.htm?_ksTS=1611910763284_313&callback=        jsonp314&mid=w-22633333039-0&wid=22633333039&path=/search.htm&search=y&spm=a220o.1000855.0.0.7fcc367fsdZyLF'    ]
    custom_settings = {        'ITEM_PIPELINES': {            'tn_scrapy.pipelines.TnScrapyPipeline': 300,        },        'DEFAULT_REQUEST_HEADERS': {            "user-agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)             Chrome/78.0.3904.70 Safari/537.36',            'cookie': '登錄后的cookie'        }    }
    def parse(self, response):        html = re.findall(r'jsonp\d+\("(.*?)"\)', response.text)[0]        # 替換掉 \        html = re.sub(r'\\', '', html)        # print('html:', html)        response = response.replace(body=html)        link = response.xpath('//div[@class="item5line1"]/dl/dd[@class="detail"]/a/@href').extract()        print('link: ', link)

以上就是python爬取天貓店鋪商品鏈接的介紹

*聲明：本文于網(wǎng)絡(luò)整理，版權(quán)歸原作者所有，如來源信息有誤或侵犯權(quán)益，請(qǐng)聯(lián)系我們刪除或授權(quán)

ps：零基礎(chǔ)系統(tǒng)放爬蟲教程可以關(guān)注我公眾號(hào)

python爬蟲學(xué)習(xí)教程：如何爬取天貓店鋪商品鏈接？