色五月婷婷激情,94自拍青,91狠狠色丁香婷婷综合久久 ,欧美成人性爱视频网站,视频四区在线播放,中文无码在线观看中文字幕av中文 ,国产AV片色哟哟,久久国产乱子伦精品免费女,网站

前言

爬蟲(chóng)解析數(shù)據(jù)有很多種，爬取不同的數(shù)據(jù)，返回的數(shù)據(jù)類(lèi)型不一樣

常見(jiàn)格式包含：

html
json
xml
文本（字符串）

掌握這 4 種解析數(shù)據(jù)的方式，無(wú)論什么樣的數(shù)據(jù)格式都可以輕松應(yīng)對(duì)處理

這四種方式分別是：1.xpath、2.bs4、3.json、4.正則

下面以實(shí)戰(zhàn)方式講解這四種技術(shù)如何使用

Xpath

1.請(qǐng)求數(shù)據(jù)

以請(qǐng)求鏈接如下的案例進(jìn)行講解

http://www.xbiquge.la/xuanhuanxiaoshuo/

導(dǎo)入相應(yīng)的庫(kù)

import requests
from lxml import etree

開(kāi)始請(qǐng)求數(shù)據(jù)

headers = {
            'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3947.100 Safari/537.36',
        }
url="http://www.xbiquge.la/xuanhuanxiaoshuo/"
res = requests.get(url,headers=headers)
res.encoding = 'utf-8'
text = res.text

2.解析數(shù)據(jù)

比如我們要獲取下面這些數(shù)據(jù)（小說(shuō)名稱(chēng)）

分析網(wǎng)頁(yè)標(biāo)簽

數(shù)據(jù)在class="l"-> ul ->li標(biāo)簽中

selector = etree.HTML(text)
list = selector.xpath('//*[@class="l"]/ul/li')

解析li中數(shù)據(jù)

可以看到，數(shù)據(jù)在li->span->a 標(biāo)簽中

for i in list:
    title = i.xpath('.//span/a/text()')
    href = i.xpath('.//span/a/@href')
    print(title)
    print(href)
    print("--------")

Bs4

1.請(qǐng)求數(shù)據(jù)

同樣以請(qǐng)求鏈接如下的案例進(jìn)行講解

http://www.xbiquge.la/xuanhuanxiaoshuo/

導(dǎo)入相應(yīng)的庫(kù)

import requests
from bs4 import BeautifulSoup

開(kāi)始請(qǐng)求數(shù)據(jù)

headers = {
            'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3947.100 Safari/537.36',
        }
url="http://www.xbiquge.la/xuanhuanxiaoshuo/"
res = requests.get(url,headers=headers)
res.encoding = 'utf-8'
text = res.text

2.解析數(shù)據(jù)

比如我們要獲取下面這些數(shù)據(jù)（小說(shuō)名稱(chēng)）

分析網(wǎng)頁(yè)標(biāo)簽

可以看到，數(shù)據(jù)在span中（class="s2") 標(biāo)簽中

法一

###法一
list = soup.find_all(attrs={'class':'s2'})
for i in list:
    print(i.a.get_text())
    print(i.a.get("href"))
    print("--------")
print(len(list))

法二

####法二
# 獲取所有的鏈接
all_link = [(link.a['href'], link.a.get_text()) for link in soup.find_all('li')]
for i in all_link:
   print(i)

json

1.請(qǐng)求數(shù)據(jù)

請(qǐng)求鏈接如下，獲取 ip 定位為案例進(jìn)行講解

https://restapi.amap.com/v3/ip?key=0113a13c88697dcea6a445584d535837&ip=123.123.123.123

導(dǎo)入相應(yīng)的庫(kù)

import requests
import json

開(kāi)始請(qǐng)求數(shù)據(jù)

ip = "123.123.123.123"
url="https://restapi.amap.com/v3/ip?key=0113a13c88697dcea6a445584d535837&ip="+str(ip)
res = requests.get(url,headers=headers)
res.encoding = 'utf-8'
text = res.text

2.解析數(shù)據(jù)

比如我們要獲取下面這些數(shù)據(jù)（省份和城市）

text = res.text
print(text)
##text不是json類(lèi)型的話(huà)，則轉(zhuǎn)為json類(lèi)型
text = json.loads(text)
print("省份="+text['province']+",城市="+text['city'])

正則表達(dá)式

1.請(qǐng)求數(shù)據(jù)

以請(qǐng)求鏈接如下為案例進(jìn)行講解

http://www.xbiquge.la/xuanhuanxiaoshuo/

導(dǎo)入相應(yīng)的庫(kù)

import requests
import re

開(kāi)始請(qǐng)求數(shù)據(jù)

headers = {
            'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3947.100 Safari/537.36',
        }
url="http://www.xbiquge.la/xuanhuanxiaoshuo/"
res = requests.get(url,headers=headers)
res.encoding = 'utf-8'
text = res.text

2.解析數(shù)據(jù)

比如我們要獲取下面這些數(shù)據(jù)（小說(shuō)名稱(chēng)）

分析網(wǎng)頁(yè)html

可以看到，數(shù)據(jù)在li->span->a 標(biāo)簽中，a標(biāo)簽前有“《”，后有“》”

pattern = re.compile('《.*?》')
items = re.findall(pattern, text)

for i in items:
    print(i)

??分享、點(diǎn)贊、在看，給個(gè)三連擊唄！

4 種不同 Python 爬蟲(chóng)解析數(shù)據(jù)方法，必須掌握！

1.請(qǐng)求數(shù)據(jù)

2.解析數(shù)據(jù)

1.請(qǐng)求數(shù)據(jù)

2.解析數(shù)據(jù)

法一

法二

1.請(qǐng)求數(shù)據(jù)

2.解析數(shù)據(jù)

1.請(qǐng)求數(shù)據(jù)

2.解析數(shù)據(jù)

4 種不同 Python 爬蟲(chóng)解析數(shù)據(jù)方法，必須掌握！