C#爬蟲-Selenium ChromeDriver 設(shè)置代理

背景
開發(fā)爬蟲程序,如果不做代理設(shè)置,本機(jī)的外網(wǎng)IP很容易被網(wǎng)站封掉,導(dǎo)致不能持續(xù)進(jìn)行數(shù)據(jù)抓取。而Selenium作為動(dòng)態(tài)網(wǎng)頁(yè)抓取的利器,我們有必要了解一下,如何對(duì)它進(jìn)行代理設(shè)置,并正常訪問(wèn)網(wǎng)頁(yè)。
解決辦法
1、首先申請(qǐng)代理ip,正常付費(fèi)的才比較靠譜。這其中包括賬號(hào)、密碼。
private string proxy_Host = "域名地址";private int proxy_Post = 端口;private string proxy_UserName = "賬號(hào)";private string proxy_PassWord = "密碼";private string proxy_CheckURL = "檢查是否正常的地址";private string Ex_Proxy_Name = "proxy.zip";
2、設(shè)置chrome background.js、manifest.json
private bool Rebuild_Extension_Proxy(string proxy_UserName, string proxy_PassWord){bool result = false;FileStream zipToOpen = null;ZipArchive archive = null;ZipArchiveEntry readmeEntry = null;StreamWriter writer = null;string background = "";string manifest = "";try{background = @"var Global = {currentProxyAouth:{username: '',password: ''}}Global.currentProxyAouth = {username: '" + proxy_UserName + @"',password: '" + proxy_PassWord + @"'}chrome.webRequest.onAuthRequired.addListener(function(details, callbackFn) {console.log('onAuthRequired >>>: ', details, callbackFn);callbackFn({authCredentials: Global.currentProxyAouth});}, {urls: [""<all_urls>""]}, [""asyncBlocking""]);chrome.runtime.onMessage.addListener(function(request, sender, sendResponse) {console.log('Background recieved a message: ', request);POPUP_PARAMS = {};if (request.command && requestHandler[request.command])requestHandler[request.command] (request);});";manifest = @"{""version"": ""1.0.0"",""manifest_version"": 2,""name"": ""Chrome Proxy"",""permissions"": [""proxy"",""tabs"",""unlimitedStorage"",""storage"",""<all_urls>"",""webRequest"",""webRequestBlocking""],""background"": {""scripts"": [""background.js""]},""minimum_chrome_version"":""22.0.0""}";zipToOpen = new FileStream(System.Environment.CurrentDirectory + "\\" + Ex_Proxy_Name, FileMode.Create);archive = new ZipArchive(zipToOpen, ZipArchiveMode.Update);readmeEntry = archive.CreateEntry("background.js");writer = new StreamWriter(readmeEntry.Open());writer.WriteLine(background);writer.Close();readmeEntry = archive.CreateEntry("manifest.json");writer = new StreamWriter(readmeEntry.Open());writer.WriteLine(manifest);writer.Close();result = true;}catch (Exception ex){result = false;}finally{if (writer != null) { writer.Close(); writer.Dispose(); writer = null; }if (readmeEntry != null) { readmeEntry = null; }if (archive != null) { archive.Dispose(); archive = null; }if (zipToOpen != null) { zipToOpen.Close(); zipToOpen.Dispose(); zipToOpen = null; }}return result;}
3、Chrome Driver使用代理Proxy
// 設(shè)置 Chrome Driver Exyension Proxy 設(shè)定bool isproxysetting = true;if (_isuseproxy){isproxysetting = Rebuild_Extension_Proxy(proxy_UserName, proxy_PassWord);}if (isproxysetting){// Driver 設(shè)定options = new ChromeOptions();if (_isuseproxy){options.Proxy = null;options.AddArguments("--proxy-server=" + proxy_Host + ":" + proxy_Post.ToString());options.AddExtension(Ex_Proxy_Name);}
4、測(cè)試一下我們的設(shè)置
private Proxy_Unit.ProxyIPInfo Get_ProxyIPInfo(string Html_Content){Proxy_Unit.ProxyIPInfo result = null;try{result = new Proxy_Unit.ProxyIPInfo();Html_Content = Html_Content.Replace("<html><head></head><body><pre style=\"word-wrap: break-word; white-space: pre-wrap;\">", "");Html_Content = Html_Content.Replace("</pre></body></html>", "");if (!Html_Content.Contains("proxy error")){result = JsonConvert.DeserializeObject<Proxy_Unit.ProxyIPInfo>(Html_Content);}else{result = null;}}catch (Exception ex){result = null;}return result;}
測(cè)試效果
成功,達(dá)到預(yù)期效果
{"ip":"213.182.205.185","country":"IS","asn":{"asnum":9009,"org_name":"M247 Ltd"},"geo":{"city":"Reykjavik","region":"1","region_name":"Capital Region","postal_code":"105","latitude":64.1369,"longitude":-21.9139,"tz":"Atlantic/Reykjavik","lum_city":"reykjavik","lum_region":"1"}}
總結(jié)
我們之前測(cè)試要為ChromeDriver設(shè)定Proxy時(shí)有遇到許多困難,需要使用Chrome Extension的管道設(shè)定Proxy才成功,以上希望能讓您比較好了解。
【推薦】.NET Core開發(fā)實(shí)戰(zhàn)視頻課程 ★★★
.NET Core實(shí)戰(zhàn)項(xiàng)目之CMS 第一章 入門篇-開篇及總體規(guī)劃
【.NET Core微服務(wù)實(shí)戰(zhàn)-統(tǒng)一身份認(rèn)證】開篇及目錄索引
Redis基本使用及百億數(shù)據(jù)量中的使用技巧分享(附視頻地址及觀看指南)
.NET Core中的一個(gè)接口多種實(shí)現(xiàn)的依賴注入與動(dòng)態(tài)選擇看這篇就夠了
10個(gè)小技巧助您寫出高性能的ASP.NET Core代碼
用abp vNext快速開發(fā)Quartz.NET定時(shí)任務(wù)管理界面
在ASP.NET Core中創(chuàng)建基于Quartz.NET托管服務(wù)輕松實(shí)現(xiàn)作業(yè)調(diào)度
現(xiàn)身說(shuō)法:實(shí)際業(yè)務(wù)出發(fā)分析百億數(shù)據(jù)量下的多表查詢優(yōu)化
