国产女人18水真多毛片18精品 ,日韩免费一级片,91久久婷婷国产麻豆精品,日韩日皮视频,天堂中文在线资源库5,国产精品伦子伦露脸,毛片88,国产日韩视频在线

文 |?某某白米飯

來(lái)源：Python 技術(shù)「ID: pythonall」

大家都多多少少會(huì)買點(diǎn)基金或者去開個(gè)戶買點(diǎn)股票。大家都是普通人也沒有什么后臺(tái)內(nèi)幕消息，經(jīng)常被割韭菜。不想被割就得去分析各種資料文檔。本文就是在天天基金網(wǎng)上抓取基金購(gòu)買的股票信息。

股票有風(fēng)險(xiǎn)，入市需謹(jǐn)慎?；鹩酗L(fēng)險(xiǎn)，入市需謹(jǐn)慎。

模塊

話不多說(shuō)先上需要調(diào)用到的模塊。

from?selenium?import?webdriver
from?selenium.webdriver.support.ui?import?WebDriverWait
from?selenium.webdriver.support?import?expected_conditions?as?EC
from?selenium.webdriver.common.by?import?By
from?lxml?import?etree
import?requests
import?re
import?threading
import?os

首頁(yè)抓取

在天天基金中找到開放式基金，如下圖，一共有 9340 支基金。

打開控制面板，找到 http://fund.eastmoney.com/data/rankhandler.aspx?op=ph&dt=kf&ft=all&rs=&gs=0&sc=6yzf&st=desc&sd=2020-11-18&... 的地址，這個(gè)地址返回的結(jié)果就是表格中的基金數(shù)據(jù)。

返回的數(shù)據(jù)類似于 json 串，根據(jù)觀察基金代碼似乎都是 6 位的數(shù)字，就可以使用正則表達(dá)式取到。

def?crawler_front_page():
????headers?=?{
????????'Referer':?'http://fund.eastmoney.com/data/fundranking.html',
????????'User-Agent':'Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/96.0.4664.45?Safari/537.36',
????????'Cookie':?'xxxx'
????}

????response?=?requests.get('http://fund.eastmoney.com/data/rankhandler.aspx?op=ph&dt=kf&ft=all&rs=&gs=0&sc=6yzf&st=desc&sd=2020-11-18&ed=2021-11-18&qdii=&tabSubtype=,,,,,&pi=1&pn=10000&dx=1&v=0.6791917206798068',?headers=headers)

????response.encoding?=?'utf-8'
????return?response.text

def?parse_front_page(html):
????return?re.findall(r"\d{6}",html)

股票持倉(cāng)抓取

隨便點(diǎn)開一個(gè)基金查看詳情，然后往下拉到股票持倉(cāng)的位置

點(diǎn)開后，可以發(fā)現(xiàn)這個(gè)頁(yè)面的網(wǎng)址是 http://fundf10.eastmoney.com/ccmx_ 加上基金代碼。

前海開源新經(jīng)濟(jì)混合A：http://fundf10.eastmoney.com/ccmx_000689.html
平安轉(zhuǎn)型創(chuàng)新混合A: http://fundf10.eastmoney.com/ccmx_004390.html

所以只需要解析首頁(yè)的基金代碼，加上前面的 http://fundf10.eastmoney.com/ccmx_ 就可以得到最終的股票投資明細(xì)頁(yè)面地址。一共是 9000 多條數(shù)據(jù)。

def?get_stock_url(codes):
????url?=?[]
????for?code?in?codes:
????????url.append("http://fundf10.eastmoney.com/ccmx_{}.html".format(code))
????????
????return?url

打開股票持倉(cāng)頁(yè)面就會(huì)發(fā)現(xiàn)這里面的數(shù)據(jù)是 js 加載的。這里需要抓取基金名稱和股票名稱。

小編在這里采用了 selenium 方式抓取內(nèi)容。用 xpath 解析頁(yè)面。selenium 抓取速度比起 requests 方式是有點(diǎn)慢的，所以在這里開了多線程抓取。一共 10 個(gè)線程，每個(gè)線程抓取 1000 條數(shù)據(jù)。

def?thread_test(*args):
????threads?=?[]
????for?crawler_count?in?["0,1000",?"1000,2000",?"2000,3000",?"3000,4000",?"4000,5000",?"5000,6000",?"6000,7000",?"7000,8000",?"8000,9000",?"9000,10000"]:
????????t?=?threading.Thread(target=crawler_stock_page,?args=(crawler_count,?args[0]))???
????????threads.append(t)

????for?t?in?threads:
????????t.start()??????
????for?t?in?threads:
????????t.join()?

抓取并解析頁(yè)面后的內(nèi)容是放在 text 文件中的，最后再讀取處理數(shù)據(jù)。當(dāng)然抓取的內(nèi)容直接放在數(shù)據(jù)庫(kù)是最好的，這樣就不用再去解析一下文本文件。

def?crawler_stock_page(c,stock_url_list):
????count?=?c.split(",")?
????driver?=?webdriver.Chrome('D:\personal\gitpython\chromedriver.exe')
????file?=?"D:/fund/fund_{}.txt".format(count[0])

????
????for?url?in?stock_url_list[int(count[0]):int(count[1])]:
????????stock_result?=?[]
????????title?=?"沒有數(shù)據(jù)"

????????try:
????????????driver.get(url)

????????????element_result?=?is_element(driver,?By.CLASS_NAME,?"tol")
????????????if?element_result:
????????????????wait?=?WebDriverWait(driver,?3)
????????????????wait.until(EC.presence_of_element_located((By.CLASS_NAME,?'tol')))
????????????????
????????????????if?is_element(driver,?By.XPATH,?'//*[@id="cctable"]/div[1]/div/div[3]/font/a'):
????????????????????driver.find_element_by_xpath('//*[@id="cctable"]/div[1]/div/div[3]/font/a').click()
????????????????????wait.until(EC.presence_of_element_located((By.CLASS_NAME,?'tol')))
????????????
????????????????stock_xpath?=?etree.HTML(driver.page_source?)
????????????????stock_result?=?stock_xpath.xpath("http://div[@id='cctable']//div[@class='box'][1]//td[3]//text()")
????????????????title?=?stock_xpath.xpath('//*[@id="cctable"]/div[1]/div/h4/label[1]/a')[0].text

????????????with?open(file,?'a+')?as?f:
????????????????????f.write("{'name':?'"?+?title?+?"',?'stock':?['"+'\',\''.join(stock_result)?+?"']}\n")????
????????except:
????????????continue

示例結(jié)果

解析文件

這步驟感覺有點(diǎn)多余，如果存在數(shù)據(jù)庫(kù)中只需要一個(gè)查詢語(yǔ)句就可以了。讀取 fund 文件夾下的所有文件，并且一行一行用 eval() 轉(zhuǎn)為字典。最終算出 9000 多基金中購(gòu)買各個(gè)股票的有幾家基金。

def?parse_data():
????result?=?{}
????stock?=?{}

????files=?os.listdir('D:/fund/')

????for?file?in?files:
????????for?line?in?open('D:/fund/'?+?file):
????????????data?=?eval(line.strip())
????????????key?=?data['name']
????????????if?key?==?'沒有數(shù)據(jù)'?or?key?in?result:
????????????????continue
????????????????
????????????result[key]?=?data['stock']

????????????for?value?in?data['stock']:
????????????????if?value?in?stock:
????????????????????stock[value]?=?stock[value]?+?1
????????????????else:
????????????????????stock[value]?=?1
????????
????????with?open('D:/fund_result/stock.csv',?'a+')?as?f:
????????????for?key?in?stock:
????????????????f.write(key?+?","?+?str(stock[key])?+?"\n")?
????????
????????with?open('D:/fund_result/fund.csv',?'a+')?as?f:???
????????????for?key?in?result:
????????????????values?=?[]
????????????????for?value?in?result[key]:
????????????????????values.append('{}({})'.format(value,?stock[value]))
????????????????f.write(key?+?','?+?','.join(values)?+?'\n')