<kbd id="afajh"><form id="afajh"></form></kbd>
<strong id="afajh"><dl id="afajh"></dl></strong>
    <del id="afajh"><form id="afajh"></form></del>
        1. <th id="afajh"><progress id="afajh"></progress></th>
          <b id="afajh"><abbr id="afajh"></abbr></b>
          <th id="afajh"><progress id="afajh"></progress></th>

          Python玩爬蟲,爬取安居客租房數(shù)據(jù)

          共 4266字,需瀏覽 9分鐘

           ·

          2022-05-13 06:20

          前言

          最近我打算在西安市長安區(qū)租個房子,于是打開安居客官網(wǎng)進行瀏覽,點擊了航天一小區(qū)域結(jié)果搜索出了一堆房源信息,為了綜合比較找出更加適合我的選擇,我敲了個代碼將租賃數(shù)據(jù)批量爬取了下來。對于如何翻頁爬取數(shù)據(jù)這個難題,我苦思冥想了一早上,突然想起螞蟻老師的《零基礎(chǔ)學(xué)Python簡單爬蟲》中教過我如何翻頁爬取他的博客。我趕快去網(wǎng)易云課堂上回顧了一下知識點,很快就解決了難題!


          1.導(dǎo)入python庫

          import?requests
          import?parsel
          import?csv

          2.編寫請求頭

          首先打開安居客網(wǎng)站的主頁,然后點擊選擇西安市長安區(qū)租房,再點擊我感興趣的航天一小區(qū)域。按F12進入開發(fā)者模式,用開發(fā)者工具搜索“航天城 航天一小一中 航天城二期主臥 神州六路 領(lǐng)包入住”,搜索出數(shù)據(jù)包后獲取cookie、referer和user-agent,編寫請求頭可以在請求數(shù)據(jù)時防止被反爬。

          headers?=?{
          ????????"cookie":"SECKEY_ABVK=TgvSiy4m0B6i5M/GnutrL4MCQAxmauku4aF+LqjfcV8%3D;?BMAP_SECKEY=E59uL36Kjw3iB4ZVJ1TzktcDRrwsPacuhFPlFvVySikxHflBOrAIKvnT_TWYzT5URefDsASa2VWCiaIX9be4V3Lnrh_Nos62tZXNwvoSq4VTzasiJ1egkq-odC5S4rdqxoaD8o2T1GJGf2QcMzz9qmHUmR4U12vFef9JGFoyWvQ;?aQQ_ajkguid=480A83B5-6C24-CA9A-70C4-SX0705130713;?isp=true;?id58=e87rkGDik4M4TQ8rCOH9Ag==;?58tj_uuid=646ff6e2-1cdd-40ea-a598-19787fe938e3;?als=0;?_gid=GA1.2.1430887075.1652243486;?_ga=GA1.2.365181464.1652243486;?ajk-appVersion=;?ctid=31;?fzq_h=66195fd3068b1f875345633f8c10fa4a_1652243654423_6c16e54157d8446097cc7e59ed9481e0_2102187861;?cmctid=483;?wmda_visited_projects=%3B6289197098934;?wmda_new_uuid=1;?wmda_uuid=82b87a341ccf0eac3db311b6b0810093;?sessid=89F34445-4127-728C-6A4A-B58CD71D8F23;?twe=2;?init_refer=https%253A%252F%252Fcn.bing.com%252F;?new_uv=5;?lps=https%3A%2F%2Fxa.zu.anjuke.com%2F%3Ffrom%3Dnavigation%7Chttps%3A%2F%2Fxa.anjuke.com%2F;?wmda_session_id_6289197098934=1652272846315-4adb7df7-8ce6-9b96;?obtain_by=2;?new_session=0;?xxzl_cid=ef3f8b7b219341ee8f8b5d044cccd51f;?xzuid=eb4ac9b2-89df-4969-9661-9e99a843bd43",
          ????????"referer":"https://xa.zu.anjuke.com/fangyuan/changanb/",
          ????????"user-agent":"Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/99.0.4844.51?Safari/537.36?Edg/99.0.1150.39",
          }

          3.創(chuàng)建csv文件

          創(chuàng)建好CSV文件后,我們可以用下面的方式添加表頭

          f?=?open('安居客租房數(shù)據(jù).csv',mode='a',encoding='utf-8',newline='')
          csv_writer?=?csv.DictWriter(f,fieldnames=['標題','房子類型','樓層','價格','小區(qū)','地址','詳情'])
          csv_writer.writeheader()??#寫入表頭????

          4.解析數(shù)據(jù)、翻頁爬取

          大家可以用選擇工具,點擊需要爬取的數(shù)據(jù),找到數(shù)據(jù)的源代碼,然后右鍵點擊“復(fù)制selector”,這樣不用非常熟悉CSS語法也可以輕松的解析數(shù)據(jù)

          for?page?in?range(1,3):?????#數(shù)據(jù)只有兩頁,用for循環(huán)進行翻頁爬取
          ????????url?=?f'https://xa.zu.anjuke.com/fangyuan/changanb-q-htyxxa/p{page}/'
          ????????r?=?requests.get(url?=?url,headers?=?headers)
          ????????selector?=?parsel.Selector(r.text)
          ????????zu_infos?=?selector.css('.zu-itemmod')?????#用CSS選擇器,提取每頁所有房子數(shù)據(jù)的信息
          ????????for?info?in?zu_infos:
          ????????????????try:
          ????????????????????????title?=?info.css('.strongbox::text').get()??#解析出標題

          ????????????????????????house_num?=?info.css('div.zu-info?>?p.details-item.tag?>?b::text').getall()??#提取出所有b標簽的文本數(shù)據(jù),即戶型數(shù)值
          ????????????????????????p?=?info.css('div.zu-info?>?p::text').getall()??#提取出所有p標簽的文本數(shù)據(jù),變量p是個列表
          ????????????????????????p?=?p[1:5]????#去掉無用信息
          ????????????????????????height?=?p[-1].strip('?')??#提取出房子所在的樓層
          ????????????????????????house_unit?=?p[:3]????#提取戶型單位
          ????????????????????????house_type?=?[house_num[i]?+?house_unit[i]??for?i?in?range(len(house_num))]???#將戶型數(shù)字與戶型單位進行字符串拼接
          ????????????????????????house_type?=?'?'.join(house_type)???#列表中各個字符串元素用空格連接

          ????????????????????????price?=?info.css('div.zu-side?>?p?>?strong?>?b::text').get()?+?'元/月'??#解析出房源的價格

          ????????????????????????community?=?info.css('div.zu-info?>?address?>?a::text').get()?????#提取小區(qū)名稱
          ????????????????????????address?=?info.css('div.zu-info?>?address::text').getall()[1].strip('\xa0\xa0\n?')???#提取房子所在地址

          ????????????????????????detail?=?info.css('div.zu-info?>?p.details-item.bot-tag?>?span::text').getall()???#提取:租賃方式、朝向、有無電梯、附近有哪條地鐵
          ????????????????????????detail?=?'?'.join(detail)??#列表中各個字符串元素用空格連接
          ????????????????????????print(title,house_type,height,price,community,address,detail)
          ????????????????????????dit?=?{
          ????????????????????????????????'標題':?title,
          ????????????????????????????????'房子類型':house_type,
          ????????????????????????????????'樓層':height,
          ????????????????????????????????'價格':?price,
          ????????????????????????????????'小區(qū)':?community,
          ????????????????????????????????'地址':?address,
          ????????????????????????????????'詳情':?detail,
          ????????????????????????}
          ????????????????????????csv_writer.writerow(dit)??#寫入csv
          ????????????????except:
          ????????????????????????pass

          得到所有數(shù)據(jù)后有時候Excel直接打開會亂碼,大家可以先用記事本打開,然后另存為時編碼格式改為“ANSI”編碼,最后再用Excel打開就好了。現(xiàn)在我可以在Excel里慢慢查看這些房源是否滿足自己的要求,希望我能安全、省心,用最低的交易成本租到滿意的房子,希望大家也能有所收獲!


          最后,推薦螞蟻老師的視頻課程,購買課程提供答疑服務(wù)、付費群聊



          瀏覽 306
          點贊
          評論
          收藏
          分享

          手機掃一掃分享

          分享
          舉報
          評論
          圖片
          表情
          推薦
          點贊
          評論
          收藏
          分享

          手機掃一掃分享

          分享
          舉報
          <kbd id="afajh"><form id="afajh"></form></kbd>
          <strong id="afajh"><dl id="afajh"></dl></strong>
            <del id="afajh"><form id="afajh"></form></del>
                1. <th id="afajh"><progress id="afajh"></progress></th>
                  <b id="afajh"><abbr id="afajh"></abbr></b>
                  <th id="afajh"><progress id="afajh"></progress></th>
                  亚洲淫乱av | 亚洲精品福利视频导航 | 欧美老熟妇乱子伦视频 | 久久久久豆花视频 | 精品人妻网站 |