本文用off-cpu火焰圖分析一個(gè)程序的延遲(主要在拿鎖上),找出來瓶頸,并消除的故事。本文非常值得一讀,但是閱碼場(chǎng)沒有足夠的時(shí)間將其翻譯為中文,希望童鞋們直接讀英文。
The Setup
As a performance engineer at MemSQL, one of my primary responsibilities is to ensure that customer Proof of Concepts (POCs) run smoothly. I was recently asked to assist with a big POC, where I was surprised to encounter an uncommon Linux performance issue. I was running a synthetic workload of 16 threads (one for each CPU core). Each one simultaneously executed a very simple query (select count(*) from t where i > 5) against a columnstore table.
In theory, this ought to be a CPU bound operation since it would be reading from a file that was already in disk buffer cache. In practice, our cores were spending about 50% of their time idle
In this post, I’ll walk through some of the debugging techniques and reveal exactly how we reached resolution.
What were our threads doing?
After confirming that our workload was indeed using 16 threads, I looked at the state of our various threads. In every refresh of myhtopwindow, I saw that a handful of threads were in theDstate corresponding to “Uninterruptible sleep”:
Why were we going off CPU?
At this point, I generated anoff-cpu flamegraphusing Linuxperf_eventsto see why we entered this state.Off-CPUmeans that instead of looking at what is keeping the CPU busy, you look at what is preventing it from being busy by things happening elsewhere (e.g. waiting for IO or a lock). The normal way to generate these visualizations is to useperf inject -s, but the machine I tested on did not have a new enough version ofperf. Instead I had to use anawkscriptI had previously written:
$ sudoperfrecord --call-graph=fp -e 'sched:sched_switch' -e 'sched:sched_stat_sleep' -e 'sched:sched_stat_blocked' --pid $(pgrep memsqld | head -n 1) -- sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.343 MB perf.data (~58684 samples) ]
$ sudoperfscript -f time,comm,pid,tid,event,ip,sym,dso,trace -i sched.data | ~/FlameGraph/stackcollapse-perf-sched.awk | ~/FlameGraph/flamegraph.pl --color=io --countname=us >off-cpu.svg
Note: recording scheduler events viaperf recordcan have a very large overhead and should be used cautiously in production environments. This is why I wrap theperf recordaround asleep 1to limit the duration.
In an off-cpu flamegraph, the width of a bar is proportional to the total time spent off cpu. Here we see a lot of time is spent inrwsem_down_write_failed.
From the repeated calls torwsem_down_read_failedandrwsem_down_write_failed, we see that culprit wasmmapcontending in the kernel on themm->mmap_semlock:
down_write(&mm->mmap_sem);
ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,&populate);
up_write(&mm->mmap_sem);
This was causing everymmapsyscall to take 10-20ms (almost half the latency of the query itself). MemSQL was so fast that that we had inadvertently written a benchmark for Linuxmmap!
The fix was simple — we switched from usingmmapto using the traditional filereadinterface. After this change, we nearly doubled our throughput and became CPU bound as we expected:
For more information and discussion around Linux performance,check out the original post on my personal blog.
Download MemSQL Community Edition to run your own performance tests for free today:memsql.com/download
Alex Reece is a systems and performance engineer. He believes in active benchmarking, root cause analysis, and fast code.
-
cpu
+關(guān)注
關(guān)注
68文章
10904瀏覽量
213023 -
Linux
+關(guān)注
關(guān)注
87文章
11345瀏覽量
210398 -
SQL
+關(guān)注
關(guān)注
1文章
774瀏覽量
44251
原文標(biāo)題:用off-cpu火焰圖調(diào)查L(zhǎng)inux性能問題
文章出處:【微信號(hào):LinuxDev,微信公眾號(hào):Linux閱碼場(chǎng)】歡迎添加關(guān)注!文章轉(zhuǎn)載請(qǐng)注明出處。
發(fā)布評(píng)論請(qǐng)先 登錄
相關(guān)推薦
中國(guó)鋰離子電池原材料市場(chǎng)調(diào)查分析報(bào)告2008-2009版
_首屆中國(guó)嵌入式應(yīng)用狀況_調(diào)查分析報(bào)告
全志Tina中使用perf分析CPU使用率
火焰識(shí)別
Linux CPU的性能應(yīng)該如何優(yōu)化
疫情之下,中國(guó)LED顯示屏市場(chǎng)活力調(diào)查分析
火焰圖系列之使用火焰圖隱藏功能提高繪制精度

火焰圖:全局視野的Linux性能剖析
基于linux eBPF的進(jìn)程off-cpu的方法

Linux問題分析與性能優(yōu)化

Linux問題故障定位的小技巧

使用Arthas火焰圖工具的Java應(yīng)用性能分析和優(yōu)化經(jīng)驗(yàn)

評(píng)論