<kbd id="afajh"><form id="afajh"></form></kbd>
<strong id="afajh"><dl id="afajh"></dl></strong>
    <del id="afajh"><form id="afajh"></form></del>
        1. <th id="afajh"><progress id="afajh"></progress></th>
          <b id="afajh"><abbr id="afajh"></abbr></b>
          <th id="afajh"><progress id="afajh"></progress></th>

          Go標(biāo)準(zhǔn)庫 http 與 fasthttp 服務(wù)端性能比較

          共 11640字,需瀏覽 24分鐘

           ·

          2021-09-02 16:14

          1. 背景

          Go初學(xué)者學(xué)習(xí)Go時,在編寫了經(jīng)典的“hello, world”程序之后,可能會迫不及待的體驗一下Go強大的標(biāo)準(zhǔn)庫,比如:用幾行代碼寫一個像下面示例這樣擁有完整功能的web server:

          //?來自https://tip.golang.org/pkg/net/http/#example_ListenAndServe
          package?main

          import?(
          ?"io"
          ?"log"
          ?"net/http"
          )

          func?main()?{
          ?helloHandler?:=?func(w?http.ResponseWriter,?req?*http.Request)?{
          ??io.WriteString(w,?"Hello,?world!\n")
          ?}
          ?http.HandleFunc("/hello",?helloHandler)
          ?log.Fatal(http.ListenAndServe(":8080",?nil))
          }

          go net/http包是一個比較均衡的通用實現(xiàn),能滿足大多數(shù)gopher 90%以上場景的需要,并且具有如下優(yōu)點:

          • 標(biāo)準(zhǔn)庫包,無需引入任何第三方依賴;
          • 對http規(guī)范的滿足度較好;
          • 無需做任何優(yōu)化,即可獲得相對較高的性能;
          • 支持HTTP代理;
          • 支持HTTPS;
          • 無縫支持HTTP/2。

          不過也正是因為http包的“均衡”通用實現(xiàn),在一些對性能要求嚴(yán)格的領(lǐng)域,net/http的性能可能無法勝任,也沒有太多的調(diào)優(yōu)空間。這時我們會將眼光轉(zhuǎn)移到其他第三方的http服務(wù)端框架實現(xiàn)上。

          而在第三方http服務(wù)端框架中,一個“行如其名”的框架fasthttp[1]被提及和采納的較多,fasthttp官網(wǎng)宣稱其性能是net/http的十倍(基于go test benchmark的測試結(jié)果)。

          fasthttp采用了許多性能優(yōu)化上的最佳實踐[2],尤其是在內(nèi)存對象的重用上,大量使用sync.Pool[3]以降低對Go GC的壓力。

          那么在真實環(huán)境中,到底fasthttp能比net/http快多少呢?恰好手里有兩臺性能還不錯的服務(wù)器可用,在本文中我們就在這個真實環(huán)境下看看他們的實際性能。

          2. 性能測試

          我們分別用net/http和fasthttp實現(xiàn)兩個幾乎“零業(yè)務(wù)”的被測程序:

          • nethttp:
          //?github.com/bigwhite/experiments/blob/master/http-benchmark/nethttp/main.go
          package?main

          import?(
          ?_?"expvar"
          ?"log"
          ?"net/http"
          ?_?"net/http/pprof"
          ?"runtime"
          ?"time"
          )

          func?main()?{
          ?go?func()?{
          ??for?{
          ???log.Println("當(dāng)前routine數(shù)量:",?runtime.NumGoroutine())
          ???time.Sleep(time.Second)
          ??}
          ?}()

          ?http.Handle("/",?http.HandlerFunc(func(w?http.ResponseWriter,?r?*http.Request)?{
          ??w.Write([]byte("Hello,?Go!"))
          ?}))

          ?log.Fatal(http.ListenAndServe(":8080",?nil))
          }
          • fasthttp:
          //?github.com/bigwhite/experiments/blob/master/http-benchmark/fasthttp/main.go

          package?main

          import?(
          ?"fmt"
          ?"log"
          ?"net/http"
          ?"runtime"
          ?"time"

          ?_?"expvar"

          ?_?"net/http/pprof"

          ?"github.com/valyala/fasthttp"
          )

          type?HelloGoHandler?struct?{
          }

          func?fastHTTPHandler(ctx?*fasthttp.RequestCtx)?{
          ?fmt.Fprintln(ctx,?"Hello,?Go!")
          }

          func?main()?{
          ?go?func()?{
          ??http.ListenAndServe(":6060",?nil)
          ?}()

          ?go?func()?{
          ??for?{
          ???log.Println("當(dāng)前routine數(shù)量:",?runtime.NumGoroutine())
          ???time.Sleep(time.Second)
          ??}
          ?}()

          ?s?:=?&fasthttp.Server{
          ??Handler:?fastHTTPHandler,
          ?}
          ?s.ListenAndServe(":8081")
          }

          對被測目標(biāo)實施壓力測試的客戶端,我們基于hey[4]這個http壓測工具進(jìn)行,為了方便調(diào)整壓力水平,我們將hey“包裹”在下面這個shell腳本中(僅適于在linux上運行):

          //?github.com/bigwhite/experiments/blob/master/http-benchmark/client/http_client_load.sh

          #?./http_client_load.sh?3?10000?10?GET?http://10.10.195.181:8080
          echo?"$0?task_num?count_per_hey?conn_per_hey?method?url"
          task_num=$1
          count_per_hey=$2
          conn_per_hey=$3
          method=$4
          url=$5

          start=$(date?+%s%N)
          for((i=1;?i<=$task_num;?i++));?do?{
          ?tm=$(date?+%T.%N)
          ????????echo?"$tm:?task?$i?start"
          ?hey?-n?$count_per_hey?-c?$conn_per_hey?-m?$method?$url?>?hey_$i.log
          ?tm=$(date?+%T.%N)
          ????????echo?"$tm:?task?$i?done"
          }?&?done
          wait
          end=$(date?+%s%N)

          count=$((?$task_num?*?$count_per_hey?))
          runtime_ns=$((?$end?-?$start?))
          runtime=`echo?"scale=2;?$runtime_ns?/?1000000000"?|?bc`
          echo?"runtime:?"$runtime
          speed=`echo?"scale=2;?$count?/?$runtime"?|?bc`
          echo?"speed:?"$speed?

          該腳本的執(zhí)行示例如下:

          bash?http_client_load.sh?8?1000000?200?GET?http://10.10.195.134:8080
          http_client_load.sh?task_num?count_per_hey?conn_per_hey?method?url
          16:58:09.146948690:?task?1?start
          16:58:09.147235080:?task?2?start
          16:58:09.147290430:?task?3?start
          16:58:09.147740230:?task?4?start
          16:58:09.147896010:?task?5?start
          16:58:09.148314900:?task?6?start
          16:58:09.148446030:?task?7?start
          16:58:09.148930840:?task?8?start
          16:58:45.001080740:?task?3?done
          16:58:45.241903500:?task?8?done
          16:58:45.261501940:?task?1?done
          16:58:50.032383770:?task?4?done
          16:58:50.985076450:?task?7?done
          16:58:51.269099430:?task?5?done
          16:58:52.008164010:?task?6?done
          16:58:52.166402430:?task?2?done
          runtime:?43.02
          speed:?185960.01

          從傳入的參數(shù)來看,該腳本并行啟動了8個task(一個task啟動一個hey),每個task向http://10.10.195.134:8080建立200個并發(fā)連接,并發(fā)送100w http GET請求。

          我們使用兩臺服務(wù)器分別放置被測目標(biāo)程序和壓力工具腳本:

          • 目標(biāo)程序所在服務(wù)器:10.10.195.181(物理機,Intel x86-64 CPU,40核,128G內(nèi)存, CentOs 7.6)
          $?cat?/etc/redhat-release
          CentOS?Linux?release?7.6.1810?(Core)?

          $?lscpu
          Architecture:??????????x86_64
          CPU?op-mode(s):????????32-bit,?64-bit
          Byte?Order:????????????Little?Endian
          CPU(s):????????????????40
          On-line?CPU(s)?list:???0-39
          Thread(s)?per?core:????2
          Core(s)?per?socket:????10
          座:???????????????? 2
          NUMA 節(jié)點:???????? 2
          廠商 ID:?????????? GenuineIntel
          CPU 系列:????????? 6
          型號:????????????? 85
          型號名稱:??????? Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
          步進(jìn):????????????? 4
          CPU MHz:???????????? 800.000
          CPU?max?MHz:???????????2201.0000
          CPU?min?MHz:???????????800.0000
          BogoMIPS:??????????? 4400.00
          虛擬化:?????????? VT-x
          L1d 緩存:????????? 32K
          L1i 緩存:????????? 32K
          L2 緩存:?????????? 1024K
          L3 緩存:?????????? 14080K
          NUMA 節(jié)點0 CPU:????0-9,20-29
          NUMA 節(jié)點1 CPU:??? 10-19,30-39
          Flags:?????????????????fpu?vme?de?pse?tsc?msr?pae?mce?cx8?apic?sep?mtrr?pge?mca?cmov?pat?pse36?clflush?dts?acpi?mmx?fxsr?sse?sse2?ss?ht?tm?pbe?syscall?nx?pdpe1gb?rdtscp?lm?constant_tsc?art?arch_perfmon?pebs?bts?rep_good?nopl?xtopology?nonstop_tsc?aperfmperf?eagerfpu?pni?pclmulqdq?dtes64?ds_cpl?vmx?smx?est?tm2?ssse3?sdbg?fma?cx16?xtpr?pdcm?pcid?dca?sse4_1?sse4_2?x2apic?movbe?popcnt?tsc_deadline_timer?aes?xsave?avx?f16c?rdrand?lahf_lm?abm?3dnowprefetch?epb?cat_l3?cdp_l3?intel_pt?ssbd?mba?ibrs?ibpb?stibp?tpr_shadow?vnmi?flexpriority?ept?vpid?fsgsbase?tsc_adjust?bmi1?hle?avx2?smep?bmi2?erms?invpcid?rtm?cqm?mpx?rdt_a?avx512f?avx512dq?rdseed?adx?smap?clflushopt?clwb?avx512cd?avx512bw?avx512vl?xsaveopt?xsavec?xgetbv1?cqm_llc?cqm_occup_llc?cqm_mbm_total?cqm_mbm_local?dtherm?ida?arat?pln?pts?pku?ospke?spec_ctrl?intel_stibp?flush_l1d

          • 壓力工具所在服務(wù)器:10.10.195.133(物理機,鯤鵬arm64 cpu,96核,80G內(nèi)存, CentOs 7.9)
          #?cat?/etc/redhat-release?
          CentOS?Linux?release?7.9.2009?(AltArch)

          #?lscpu
          Architecture:??????????aarch64
          Byte?Order:????????????Little?Endian
          CPU(s):????????????????96
          On-line?CPU(s)?list:???0-95
          Thread(s)?per?core:????1
          Core(s)?per?socket:????48
          座:???????????????? 2
          NUMA 節(jié)點:???????? 4
          型號:??????????????0
          CPU?max?MHz:???????????2600.0000
          CPU?min?MHz:???????????200.0000
          BogoMIPS:??????????? 200.00
          L1d 緩存:????????? 64K
          L1i 緩存:????????? 64K
          L2 緩存:?????????? 512K
          L3 緩存:?????????? 49152K
          NUMA 節(jié)點0 CPU:????0-23
          NUMA 節(jié)點1 CPU:??? 24-47
          NUMA 節(jié)點2 CPU:??? 48-71
          NUMA 節(jié)點3 CPU:??? 72-95
          Flags:?????????????????fp?asimd?evtstrm?aes?pmull?sha1?sha2?crc32?atomics?fphp?asimdhp?cpuid?asimdrdm?jscvt?fcma?dcpop?asimddp?asimdfhm

          我用dstat監(jiān)控被測目標(biāo)所在主機資源占用情況(dstat -tcdngym),尤其是cpu負(fù)荷;通過expvarmon監(jiān)控memstats[5]查看目標(biāo)程序中對各類資源消耗情況的排名。

          下面是多次測試后制作的一個數(shù)據(jù)表格:

          6530efecad17e9a8414f8d9ec6fbc91d.webp

          圖:測試數(shù)據(jù)

          3. 對結(jié)果的簡要分析

          受特定場景、測試工具及腳本精確性以及壓力測試環(huán)境的影響,上面的測試結(jié)果有一定局限,但卻真實反映了被測目標(biāo)的性能趨勢。我們看到在給予同樣壓力的情況下,fasthttp并沒有10倍于net http的性能,甚至在這樣一個特定的場景下,兩倍于net/http的性能都沒有達(dá)到:我們看到在目標(biāo)主機cpu資源消耗接近70%的幾個用例中,fasthttp的性能僅比net/http高出30%~70%左右。

          那么為什么fasthttp的性能未及預(yù)期呢?要回答這個問題,那就要看看net/http和fasthttp各自的實現(xiàn)原理了!我們先來看看net/http的工作原理示意圖:

          3af29c6f8ff644f6cb8f7acd90c3c98d.webp

          圖:nethttp工作原理示意圖

          http包作為server端的原理很簡單,那就是accept到一個連接(conn)之后,將這個conn甩給一個worker goroutine去處理,后者一直存在,直到該conn的生命周期結(jié)束:即連接關(guān)閉。

          下面是fasthttp的工作原理示意圖:

          c64d8cafc556b3164c4cd0c9df72118b.webp

          圖:fasthttp工作原理示意圖

          而fasthttp設(shè)計了一套機制,目的是盡量復(fù)用goroutine,而不是每次都創(chuàng)建新的goroutine。fasthttp的Server accept一個conn之后,會嘗試從workerpool中的ready切片中取出一個channel,該channel與某個worker goroutine一一對應(yīng)。一旦取出channel,就會將accept到的conn寫到該channel里,而channel另一端的worker goroutine就會處理該conn上的數(shù)據(jù)讀寫。當(dāng)處理完該conn后,該worker goroutine不會退出,而是會將自己對應(yīng)的那個channel重新放回workerpool中的ready切片中,等待這下一次被取出。

          fasthttp的goroutine復(fù)用策略初衷很好,但在這里的測試場景下效果不明顯,從測試結(jié)果便可看得出來,在相同的客戶端并發(fā)和壓力下,net/http使用的goroutine數(shù)量與fasthttp相差無幾。這是由測試模型導(dǎo)致的:在我們這個測試中,每個task中的hey都會向被測目標(biāo)發(fā)起固定數(shù)量的長連接(keep-alive),然后在每條連接上發(fā)起“飽和”請求。這樣fasthttp workerpool中的goroutine一旦接收到某個conn就只能在該conn上的通訊結(jié)束后才能重新放回,而該conn直到測試結(jié)束才會close,因此這樣的場景相當(dāng)于讓fasthttp“退化”成了net/http的模型,也染上了net/http的“缺陷”:goroutine的數(shù)量一旦多起來,go runtime自身調(diào)度所帶來的消耗便不可忽視甚至超過了業(yè)務(wù)處理所消耗的資源占比。下面分別是fasthttp在200長連接、8000長連接以及16000長連接下的cpu profile的結(jié)果:

          200長連接:

          (pprof)?top?-cum
          Showing?nodes?accounting?for?88.17s,?55.35%?of?159.30s?total
          Dropped?150?nodes?(cum?<=?0.80s)
          Showing?top?10?nodes?out?of?60
          ??????flat??flat%???sum%????????cum???cum%
          ?????0.46s??0.29%??0.29%????101.46s?63.69%??github.com/valyala/fasthttp.(*Server).serveConn
          ?????????0?????0%??0.29%????101.46s?63.69%??github.com/valyala/fasthttp.(*workerPool).getCh.func1
          ?????????0?????0%??0.29%????101.46s?63.69%??github.com/valyala/fasthttp.(*workerPool).workerFunc
          ?????0.04s?0.025%??0.31%?????89.46s?56.16%??internal/poll.ignoringEINTRIO?(inline)
          ????87.38s?54.85%?55.17%?????89.27s?56.04%??syscall.Syscall
          ?????0.12s?0.075%?55.24%?????60.39s?37.91%??bufio.(*Writer).Flush
          ?????????0?????0%?55.24%?????60.22s?37.80%??net.(*conn).Write
          ?????0.08s??0.05%?55.29%?????60.21s?37.80%??net.(*netFD).Write
          ?????0.09s?0.056%?55.35%?????60.12s?37.74%??internal/poll.(*FD).Write
          ?????????0?????0%?55.35%?????59.86s?37.58%??syscall.Write?(inline)
          (pprof)?

          8000長連接:

          (pprof)?top?-cum
          Showing?nodes?accounting?for?108.51s,?54.46%?of?199.23s?total
          Dropped?204?nodes?(cum?<=?1s)
          Showing?top?10?nodes?out?of?66
          ??????flat??flat%???sum%????????cum???cum%
          ?????????0?????0%?????0%????119.11s?59.79%??github.com/valyala/fasthttp.(*workerPool).getCh.func1
          ?????????0?????0%?????0%????119.11s?59.79%??github.com/valyala/fasthttp.(*workerPool).workerFunc
          ?????0.69s??0.35%??0.35%????119.05s?59.76%??github.com/valyala/fasthttp.(*Server).serveConn
          ?????0.04s??0.02%??0.37%????104.22s?52.31%??internal/poll.ignoringEINTRIO?(inline)
          ???101.58s?50.99%?51.35%????103.95s?52.18%??syscall.Syscall
          ?????0.10s??0.05%?51.40%?????79.95s?40.13%??runtime.mcall
          ?????0.06s??0.03%?51.43%?????79.85s?40.08%??runtime.park_m
          ?????0.23s??0.12%?51.55%?????79.30s?39.80%??runtime.schedule
          ?????5.67s??2.85%?54.39%?????77.47s?38.88%??runtime.findrunnable
          ?????0.14s??0.07%?54.46%?????68.96s?34.61%??bufio.(*Writer).Flush

          16000長連接:

          (pprof)?top?-cum
          Showing?nodes?accounting?for?239.60s,?87.07%?of?275.17s?total
          Dropped?190?nodes?(cum?<=?1.38s)
          Showing?top?10?nodes?out?of?46
          ??????flat??flat%???sum%????????cum???cum%
          ?????0.04s?0.015%?0.015%????153.38s?55.74%??runtime.mcall
          ?????0.01s?0.0036%?0.018%????153.34s?55.73%??runtime.park_m
          ?????0.12s?0.044%?0.062%???????153s?55.60%??runtime.schedule
          ?????0.66s??0.24%???0.3%????152.66s?55.48%??runtime.findrunnable
          ?????0.15s?0.055%??0.36%????127.53s?46.35%??runtime.netpoll
          ???127.04s?46.17%?46.52%????127.04s?46.17%??runtime.epollwait
          ?????????0?????0%?46.52%???????121s?43.97%??github.com/valyala/fasthttp.(*workerPool).getCh.func1
          ?????????0?????0%?46.52%???????121s?43.97%??github.com/valyala/fasthttp.(*workerPool).workerFunc
          ?????0.41s??0.15%?46.67%????120.18s?43.67%??github.com/valyala/fasthttp.(*Server).serveConn
          ???111.17s?40.40%?87.07%????111.99s?40.70%??syscall.Syscall
          (pprof)?

          通過上述profile的比對,我們發(fā)現(xiàn)當(dāng)長連接數(shù)量增多時(即workerpool中g(shù)oroutine數(shù)量增多時),go runtime調(diào)度的占比會逐漸提升,在16000連接時,runtime調(diào)度的各個函數(shù)已經(jīng)排名前4了。

          4. 優(yōu)化途徑

          從上面的測試結(jié)果,我們看到fasthttp的模型不太適合這種連接連上后進(jìn)行持續(xù)“飽和”請求的場景,更適合短連接或長連接但沒有持續(xù)飽和請求,在后面這樣的場景下,它的goroutine復(fù)用模型才能更好的得以發(fā)揮。

          但即便“退化”為了net/http模型,fasthttp的性能依然要比net/http略好,這是為什么呢?這些性能提升主要是fasthttp在內(nèi)存分配層面的優(yōu)化trick的結(jié)果,比如大量使用sync.Pool,比如避免在[]byte和string互轉(zhuǎn)等。

          那么,在持續(xù)“飽和”請求的場景下,如何讓fasthttp workerpool中g(shù)oroutine的數(shù)量不會因conn的增多而線性增長呢?fasthttp官方?jīng)]有給出答案,但一條可以考慮的路徑是使用os的多路復(fù)用(linux上的實現(xiàn)為epoll),即go runtime netpoll使用的那套機制。在多路復(fù)用的機制下,這樣可以讓每個workerpool中的goroutine處理同時處理多個連接,這樣我們可以根據(jù)業(yè)務(wù)規(guī)模選擇workerpool池的大小,而不是像目前這樣幾乎是任意增長goroutine的數(shù)量。當(dāng)然,在用戶層面引入epoll也可能會帶來系統(tǒng)調(diào)用占比的增多以及響應(yīng)延遲增大等問題。至于該路徑是否可行,還是要看具體實現(xiàn)和測試結(jié)果。

          注:fasthttp.Server中的Concurrency可以用來限制workerpool中并發(fā)處理的goroutine的個數(shù),但由于每個goroutine只處理一個連接,當(dāng)Concurrency設(shè)置過小時,后續(xù)的連接可能就會被fasthttp拒絕服務(wù)。因此fasthttp的默認(rèn)Concurrency為:

          const?DefaultConcurrency?=?256?*?1024

          本文涉及的源碼可以在這里[6]?github.com/bigwhite/experiments/blob/master/http-benchmark 下載。


          參考資料

          [1]?

          fasthttp:?https://github.com/valyala/fasthttp

          [2]?

          性能優(yōu)化上的最佳實踐:?https://github.com/valyala/fasthttp#fasthttp-best-practices

          [3]?

          sync.Pool:?https://www.imooc.com/read/87/article/2432

          [4]?

          hey:?https://github.com/rakyll/hey

          [5]?

          expvarmon監(jiān)控memstats:?https://mp.weixin.qq.com/s/cr2JeUq5HOYQC0qji_Ip5g

          [6]?

          這里:?github.com/bigwhite/experiments/blob/master/http-benchmark

          [7]?

          改善Go語?編程質(zhì)量的50個有效實踐:?https://www.imooc.com/read/87

          [8]?

          Kubernetes實戰(zhàn):高可用集群搭建、配置、運維與應(yīng)用:?https://coding.imooc.com/class/284.html

          [9]?

          我愛發(fā)短信:?https://51smspush.com/

          [10]?

          鏈接地址:?https://m.do.co/c/bff6eed92687



          ? ?

          61c81d3ad76ec096515e8c54ab1deed0.webp
          喜歡明哥文章的同學(xué)歡迎長按下圖訂閱!

          ???

          瀏覽 105
          點贊
          評論
          收藏
          分享

          手機掃一掃分享

          分享
          舉報
          評論
          圖片
          表情
          推薦
          點贊
          評論
          收藏
          分享

          手機掃一掃分享

          分享
          舉報
          <kbd id="afajh"><form id="afajh"></form></kbd>
          <strong id="afajh"><dl id="afajh"></dl></strong>
            <del id="afajh"><form id="afajh"></form></del>
                1. <th id="afajh"><progress id="afajh"></progress></th>
                  <b id="afajh"><abbr id="afajh"></abbr></b>
                  <th id="afajh"><progress id="afajh"></progress></th>
                  激情国产内射 | 男人操女人免费网站 | 日韩福利在线 | 国产精品扒开腿 | 翔田千里系列无码流出 |