midway的使用教程
一、寫(xiě)在前面
先說(shuō)下本文的背景,這是一道筆者遇到的Node后端面試題,遂記錄下,通過(guò)本文的閱讀,你將對(duì)樓下知識(shí)點(diǎn)有所了解:
midway項(xiàng)目的創(chuàng)建與使用 typescript在Node項(xiàng)目中的應(yīng)用 如何基于Node自身API封裝請(qǐng)求 cheerio在項(xiàng)目中的應(yīng)用 正則表達(dá)式在項(xiàng)目中的應(yīng)用 單元測(cè)試
二、midway項(xiàng)目的創(chuàng)建和使用
第一步:輸入命令**npm init midway**初始化midway項(xiàng)目
第二步:選擇**koa-v3 - A web application boilerplate with midway v3(koa)**,按下回車(chē)
???www?npm?init?midway
npx:?installed?1?in?4.755s
??Hello,?traveller.
??Which?template?do?you?like??…
?⊙?v3
??koa-v3?-?A?web?application?boilerplate?with?midway?v3(koa)
??egg-v3?-?A?web?application?boilerplate?with?midway?v3(egg)
??faas-v3?-?A?serverless?application?boilerplate?with?midway?v3(faas)
??component-v3?-?A?midway?component?boilerplate?for?v3
?⊙?v2
??web?-?A?web?application?boilerplate?with?midway?and?Egg.js
??koa?-?A?web?application?boilerplate?with?midway?and?koa
第三步:輸入你要?jiǎng)?chuàng)建的項(xiàng)目名稱(chēng),例如**“midway-project”****, ****What name would you like to use for the new project? ? midway-project**
第四步:跟著提示走就好了,分別執(zhí)行**cd midway-project**和**npm run dev**, 這個(gè)時(shí)候如果你沒(méi)有特別設(shè)置的話(huà),打開(kāi)**http://localhost:7001**就可以看到效果了
???www?npm?init?midway
npx:?installed?1?in?4.755s
??Hello,?traveller.
??Which?template?do?you?like??·?koa-v3?-?A?web?application?boilerplate?with?midway?v3(koa)
??What?name?would?you?like?to?use?for?the?new?project??·?midway-project
Successfully?created?project?midway-project
Get?started?with?the?following?commands:
$?cd?midway-project
$?npm?run?dev
Thanks?for?using?Midway
Document???Star:?https://github.com/midwayjs/midway
???╭────────────────────────────────────────────────────────────────╮
???│????????????????????????????????????????????????????????????????│
???│??????New?major?version?of?npm?available!?6.14.15?→?8.12.1??????│
???│???Changelog:?https://github.com/npm/cli/releases/tag/v8.12.1???│
???│???????????????Run?npm?install?-g?npm?to?update!????????????????│
???│????????????????????????????????????????????????????????????????│
???╰────────────────────────────────────────────────────────────────╯
???www
具體的官網(wǎng)已經(jīng)寫(xiě)的很詳細(xì)了,不再贅述,參見(jiàn):
三、如何抓取百度首頁(yè)的內(nèi)容
3.1、基于node自身API封裝請(qǐng)求
在node.js的https模塊有相關(guān)的get請(qǐng)求方法可以獲取頁(yè)面元素,具體的如下請(qǐng)參見(jiàn):,我把它封裝了一下
import?{?get?}?from?'https';
async?function?getPage(url?=?'https://www.baidu.com/'):?Promise?{
??let?data?=?'';
??return?new?Promise((resolve,?reject)?=>?{
????get(url,?res?=>?{
??????res.on('data',?chunk?=>?{
????????data?+=?chunk;
??????});
??????res.on('error',?err?=>?reject(err));
??????res.on('end',?()?=>?{
????????resolve(data);
??????});
????});
??});
}
額,你要測(cè)試這個(gè)方法,在node環(huán)境的話(huà),其實(shí)也很簡(jiǎn)單的,這樣寫(xiě)
(async?()?=>?{
??const?ret?=?await?getPage();
??console.log('ret:',?ret);
})();
四、如何獲取對(duì)應(yīng)標(biāo)簽元素的屬性
題目是,從獲取的HTML源代碼文本里,解析出id=lg的div標(biāo)簽里面的img標(biāo)簽,并返回此img標(biāo)簽上的src屬性值
4.1、cheerio一把梭
如果你沒(méi)趕上JQuery時(shí)代,那么其實(shí)你可以學(xué)下cheerio這個(gè)庫(kù),它有這個(gè)JQuery類(lèi)似的API ------為服務(wù)器特別定制的,快速、靈活、實(shí)施的jQuery核心實(shí)現(xiàn).具體的參見(jiàn):,github地址是:
在了解了樓上的知識(shí)點(diǎn)以后呢,那其實(shí)就很簡(jiǎn)單了,調(diào)調(diào)API出結(jié)果。下文代碼塊的意思是,獲取id為lg的div標(biāo)簽,獲取它的子標(biāo)簽的img標(biāo)簽,然后調(diào)用了ES6中數(shù)組的高階函數(shù)map,這是一個(gè)冪等函數(shù),會(huì)返回與輸入相同的數(shù)據(jù)結(jié)構(gòu)的數(shù)據(jù),最后調(diào)用get獲取一下并字符串一下。
?@Get('/useCheerio')
??async?useCheerio():?Promise>?{
????const?ret?=?await?getPage();
????const?$?=?load(ret);
????const?imgSrc?=?$('div[id=lg]')
??????.children('img')
??????.map(function?()?{
????????return?$(this).attr('src');
??????})
??????.get()
??????.join(',');
????return?packResp({?func:?'useCheerio',?imgSrc?});
??}
4.2、正則一把梭
看到一大坨字符串,嗯,正則也是應(yīng)該要想到的答案。筆者正則不太好,這里寫(xiě)不出一步到位的正則,先寫(xiě)出匹配id為lg的div的正則,然后進(jìn)一步匹配對(duì)應(yīng)的img標(biāo)簽的src屬性,是的,一步不行,那咱就走兩步,最終結(jié)果和走一步是一樣的。
?@Get('/useRegExp')
??async?useRegExp():?Promise>?{
????const?ret?=?await?getPage();
????//?匹配id為lg的div正則
????const?reDivLg?=?/(?<=)(.*?)(?=<\/div>)/gi;
????//?匹配img標(biāo)簽的src屬性
????const?reSrc?=?//i;
????const?imgSrc?=?ret.match(reDivLg)[0].match(reSrc)[1];
????return?packResp({?func:?'useRegExp',?imgSrc?});
??}
五、單元測(cè)試
這里要實(shí)現(xiàn)兩個(gè)測(cè)試點(diǎn)是,1、如果接口請(qǐng)求時(shí)間超過(guò)1秒鐘,則Assert斷言失敗, 2、如果接口返回值不等于"http://www.baidu.com/img/bd_logo1.png",則Assert斷言失敗 midway集成了jest的單元測(cè)試, 官網(wǎng)已經(jīng)寫(xiě)的很詳細(xì)了,具體的參見(jiàn):
關(guān)于1秒鐘這事,我們可以計(jì)算下請(qǐng)求的時(shí)間戳,具體的如下:
const?startTime?=?Date.now();
//?make?request
const?result:?any?=?await?createHttpRequest(app).get('/useRegExp');
const?cost?=?Date.now()?-?startTime;
最后再斷言下就好了 expect(cost).toBeLessThanOrEqual(1000);
最終的代碼如下:
??it.only('should?GET?/useRegExp',?async?()?=>?{
????const?startTime?=?Date.now();
????//?make?request
????const?result:?any?=?await?createHttpRequest(app).get('/useRegExp');
????const?cost?=?Date.now()?-?startTime;
????//?2.?如果接口請(qǐng)求時(shí)間超過(guò)1秒鐘,則Assert斷言失敗
????const?{
??????data:?{?imgSrc?},
????}?=?result.body?as?IPackResp;
????expect(imgSrc).not.toBe('//www.baidu.com/img/bd_logo1.png');
????notDeepStrictEqual(imgSrc,?'//www.baidu.com/img/bd_logo1.png');
????expect(cost).toBeLessThanOrEqual(1000);
????expect(imgSrc).toBe('//www.baidu.com/img/flexible/logo/pc/index.png');
????deepStrictEqual(imgSrc,?'//www.baidu.com/img/flexible/logo/pc/index.png');
??});
??it.only('should?GET?/useCheerio',?async?()?=>?{
????const?startTime?=?Date.now();
????//?make?request
????const?result:?any?=?await?createHttpRequest(app).get('/useCheerio');
????const?cost?=?Date.now()?-?startTime;
????const?{
??????data:?{?imgSrc?},
????}?=?result.body?as?IPackResp;
????expect(imgSrc).not.toBe('//www.baidu.com/img/bd_logo1.png');
????notDeepStrictEqual(imgSrc,?'//www.baidu.com/img/bd_logo1.png');
????expect(cost).toBeLessThanOrEqual(1000);
????expect(imgSrc).toBe('//www.baidu.com/img/flexible/logo/pc/index.png');
????deepStrictEqual(imgSrc,?'//www.baidu.com/img/flexible/logo/pc/index.png');
??});
六、寫(xiě)在后面
這里,如果你眼睛夠細(xì),你會(huì)發(fā)現(xiàn)一個(gè)很有意思的現(xiàn)象,你從瀏覽器打開(kāi)百度首頁(yè),然后控制臺(tái)輸出樓上的需求是這樣的
const?lg?=?document.getElementById('lg');
undefined
lg.childNodes.forEach((node)?=>?{?if(node.nodeName.toLowerCase()?===?'img')?{?console.log(node.src)?}?})
2VM618:1?https://dss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/img/logo/logo_white-d0c9fe2af5.png
VM618:1?https://www.baidu.com/img/PCfb_5bf082d29588c07f842ccde3f97243ea.png
undefined
然而,通過(guò)Node自帶的https庫(kù),你會(huì)發(fā)現(xiàn)//www.baidu.com/img/flexible/logo/pc/index.png這個(gè)
咦,震驚.jpg. 發(fā)生了什么?莫不是度度做了什么處理?于是乎,我用wget測(cè)試了下wget -O baidu.html [https://www.baidu.com](https://www.baidu.com), 發(fā)現(xiàn)正常發(fā)請(qǐng)求是這樣的
???tmp?wget?-O?baidu.html?https://www.baidu.com
--2022-06-10?00:36:17--??https://www.baidu.com/
Resolving?www.baidu.com?(www.baidu.com)...?182.61.200.6,?182.61.200.7
Connecting?to?www.baidu.com?(www.baidu.com)|182.61.200.6|:443...?connected.
HTTP?request?sent,?awaiting?response...?200?OK
Length:?2443?(2.4K)?[text/html]
Saving?to:?‘baidu.html’
baidu.html??????????????????????????????????????????????????????100%[=====================================================================================================================================================>]???2.39K??--.-KB/s????in?0s
2022-06-10?00:36:18?(48.3?MB/s)?-?‘baidu.html’?saved?[2443/2443]
???tmp?cat?baidu.html
?百度一下,你就知道 ???????©2017 Baidu 使用百度前必讀 ?意見(jiàn)反饋 京ICP證030173號(hào) ??
?????
???tmp
但是當(dāng)我給上模擬瀏覽器的請(qǐng)求后wget --user-agent="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.204 Safari/534.16" [https://www.baidu.com](https://www.baidu.com)
???tmp?wget?--user-agent="Mozilla/5.0?(Windows;?U;?Windows?NT?6.1;?en-US)?AppleWebKit/534.16?(KHTML,?like?Gecko)?Chrome/10.0.648.204?Safari/534.16"??https://www.baidu.com
--2022-06-10?00:38:53--??https://www.baidu.com/
Resolving?www.baidu.com?(www.baidu.com)...?182.61.200.7,?182.61.200.6
Connecting?to?www.baidu.com?(www.baidu.com)|182.61.200.7|:443...?connected.
HTTP?request?sent,?awaiting?response...?200?OK
Length:?unspecified?[text/html]
Saving?to:?‘index.html’
index.html??????????????????????????????????????????????????????????[?<=>??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????]?350.76K??--.-KB/s????in?0.01s
2022-06-10?00:38:53?(35.1?MB/s)?-?‘index.html’?saved?[359175]
???tmp
這個(gè)是跟瀏覽器的行為一致的,輸出的結(jié)果是三個(gè)img標(biāo)簽。
關(guān)于Node.js的https庫(kù)對(duì)這塊的處理我沒(méi)有去深究了,我就是通過(guò)樓上的例子猜了下,應(yīng)該是它那邊服務(wù)器做了對(duì)客戶(hù)端的相關(guān)判定,然后返回相應(yīng)html文本,所以這里想辦法給node.js設(shè)置一個(gè)樓上的user-agent我猜是可以得到跟PC一樣的結(jié)果的,這個(gè)作業(yè)就交給讀者了,歡迎在下方留言討論!
項(xiàng)目地址:?https://github.com/ataola/play-baidu-midway-crawler
線(xiàn)上訪問(wèn):?http://106.12.158.11:8090/
