Python3爬取前程無憂招聘數(shù)據(jù)教程
文章來自群友 易某某 的投稿,在此表示感謝!
原文鏈接:https://blog.csdn.net/weixin_42572590/article/details/103443213
目錄
????1、背景介紹
1、背景介紹
2、爬取數(shù)據(jù)保存到txt文件
(1)網(wǎng)頁分析



1?pat='(.*?).*?(.*?).*?(.*?)'
(2)代碼編寫
1#爬取前程無憂Python數(shù)據(jù)--寫進(jìn).txt文件
2import?urllib.request
3import?re
4
5#獲取源碼
6def?get_content(page):
7????url='https://search.51job.com/list/000000,000000,0000,00,9,99,%25E5%25A4%25A7%25E6%2595%25B0%25E6%258D%25AE,2,'+str(page)+'.html?lang=c&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&ord_field=0&dibiaoid=0&line=&welfare='
8????html?=?urllib.request.urlopen(url).read().decode("GBK","ignore")#打開網(wǎng)址
9????return?html
10
11
12#讀取此網(wǎng)頁里面的內(nèi)容并把正則表達(dá)式匹配的數(shù)據(jù)提取出來
13def?get(html):
14????pat='(.*?).*?(.*?).*?(.*?)'
15????#pat='
16????rst=re.compile(pat,re.S).findall(html)
17????return?rst
18
19
20#多頁處理,下載到文件
21for?i?in?range(1,10):
22????print("正在爬取第"+str(i)+"頁數(shù)據(jù)...")
23????html=get_content(i)#調(diào)用獲取網(wǎng)頁源碼
24????#print("網(wǎng)址源碼:"+html)
25????rst=get(html)
26????#print("數(shù)據(jù):"+str(rst))
27????for?j?in?rst:
28????????with?open("D:/Test/data/data1.txt","a",encoding="utf-8")?as?f:
29????????????f.write(j[0]+'\t'+j[1]+'\t'+j[2]+'\t'+j[3]+'\t'+j[4]+'\t'+'\n')
30????????????f.close()
31print('程序運(yùn)行結(jié)束!')
(3)最終結(jié)果

3、爬取數(shù)據(jù)保存到excel文件
(1)代碼編寫
1#爬取前程無憂Python數(shù)據(jù)--創(chuàng)建并寫進(jìn)excel文件
2import?urllib.request
3import?re
4import?xlwt?#用來創(chuàng)建excel文檔并寫入數(shù)據(jù)
5
6#獲取源碼
7def?get_content(page):
8????url='https://search.51job.com/list/000000,000000,0000,00,9,99,%25E5%25A4%25A7%25E6%2595%25B0%25E6%258D%25AE,2,'+str(page)+'.html?lang=c&postchannel=0000&workyear=99&cotype=99°reefrom=99&jobterm=99&companysize=99&ord_field=0&dibiaoid=0&line=&welfare='
9????html?=?urllib.request.urlopen(url).read().decode("GBK","ignore")#打開網(wǎng)址
10????return?html
11
12#讀取此網(wǎng)頁里面的內(nèi)容并把正則表達(dá)式匹配的數(shù)據(jù)提取出來
13def?get(html):
14????pat='(.*?).*?(.*?).*?(.*?)'
15????rst=re.compile(pat,re.S).findall(html)
16????return?rst
17
18#爬取到的內(nèi)容寫入excel表格
19def?excel_write(rst,index):
20????for?item?in?rst:
21????????for?i?in?range(0,5):
22????????????ws.write(index,i,item[i])?#行,列,數(shù)據(jù)
23????????print(index)
24????????index?+=?1
25
26newTable="D:/Test/data/data1.xls"?#表格名字
27wb?=?xlwt.Workbook(encoding='utf-8')????#創(chuàng)建excel文件,聲明編碼
28ws?=?wb.add_sheet('sheet1')?#創(chuàng)建表格
29headData?=?['招聘職位','公司','地址','薪資','日期']?#表頭信息
30for?colnum?in?range(0,5):
31????ws.write(0,colnum,headData[colnum],xlwt.easyxf('font:bold?on'))?#行,列
32
33for?each?in?range(1,10):
34????index?=?(each-1)?*?50?+?1
35????excel_write(get(get_content(each)),index)
36wb.save(newTable)
37print('程序運(yùn)行結(jié)束!')
(2)最終結(jié)果

看到這兒,很多小伙伴會說,數(shù)據(jù)集有了,可是不知道怎么分析啊?!嚴(yán)小樣兒貼心地告訴你兩種方法:
查看頭條文章,內(nèi)有Excel+Tableau教程;
鏈接送上,傳送門祝你成功。
傳送門1:基于Python|“數(shù)據(jù)分析崗位”招聘情況分析!
--END--
(掃碼關(guān)注我,帶你玩轉(zhuǎn)數(shù)據(jù)分析)
讀完、看完,點(diǎn)在看~
評論
圖片
表情
