歡迎來到Linux教程網
Linux教程網
Linux教程網
Linux教程網
Linux教程網 >> Linux綜合 >> Linux內核 >> Linux內核TCP/IP參數分析與調優

Linux內核TCP/IP參數分析與調優

日期:2017/2/28 13:54:51   编辑:Linux內核

如下圖展示的是TCP的三個階段.1,TCP三次握手. 2,TCP數據傳輸. 3,TCP的四次揮手。

SYN:(同步序列編號,Synchronize Sequence Numbers)該標志僅在三次握手建立的時候有效。表示一個新的TCP連接請求。

ACK:(確認編號,Acknowledgement Number)是對TCP請求的確認標志,同事提示對端系統已經成功連接所有數據。

FIN(結束標志,Finish)用來結束一個TCP會話,但對應端口仍處於開放狀態,准備接受新數據。
一下分別解析11個階段的Server端和Client端的TCP狀態。

1)、LISTEN:首先服務端需要打開一個socket進行監聽,狀態為LISTEN. /* The socket is listening for incoming connections. 偵聽來自遠方TCP端口的連接請求 */

2)、SYN_SENT:客戶端通過應用程序調用connect進行active open.於是客戶端tcp發送一個SYN以請求建立一個連接.之後狀態置為SYN_SENT. /*The socket isactively attempting toestablish a connection. 在發送連接請求後等待匹配的連接請求 */

3)、SYN_RECV:服務端應發出ACK確認客戶端的SYN,同時自己向客戶端發送一個SYN. 之後狀態置為SYN_RECV /* A connection request has been received fromthenetwork. 在收到和發送一個連接請求後等待對連接請求的確認 */(這一過程很短暫,用netstat很難看到這種狀態)

4)、ESTABLISHED: 代表一個打開的連接,雙方可以進行或已經在數據交互了。/* The socket has anestablishedconnection. 代表一個打開的連接,數據可以傳送給用戶 */

5)、FIN_WAIT1:主動關閉(active close)端應用程序調用close,於是其TCP發出FIN請求主動關閉連接,之後進入FIN_WAIT1狀態./* The socket is closed, andtheconnection is shutting down. 等待遠程TCP的連接中斷請求,或先前的連接中斷請求的確認 */(FIN_WAIT1只出現在主動關閉的那一端,其實FIN_WAIT_1和FIN_WAIT_2狀態的真正含義都是表示等待對方的FIN報文。而這兩種狀態的區別是:FIN_WAIT_1狀態實際上是當SOCKET在ESTABLISHED狀態時,它想主動關閉連接,向對方發送了FIN報文,此時該SOCKET即進入到FIN_WAIT_1狀態。而當對方回應ACK報文後,則進入到FIN_WAIT_2狀態,當然在實際的正常情況下,無論對方何種情況下,都應該馬上回應ACK報文,所以FIN_WAIT_1狀態一般是比較難見到的,而FIN_WAIT_2狀態還有時常常可以用netstat看到。)

6)、CLOSE_WAIT:被動關閉(passive close)端TCP接到FIN後,就發出ACK以回應FIN請求(它的接收也作為文件結束符傳遞給上層應用程序),並進入CLOSE_WAIT. /* The remote end hasshut down, waitingfor the socket to close. 等待從本地用戶發來的連接中斷請求 */

7)、FIN_WAIT2:主動關閉端接到ACK後,就進入了FIN-WAIT-2 ./* Connection is closed, and the socket is waiting forashutdown from the remote end. 從遠程TCP等待連接中斷請求*/

8)、LAST_ACK:被動關閉端一段時間後,接收到文件結束符的應用程序將調用CLOSE關閉連接。這導致它的TCP也發送一個 FIN,等待對方的ACK.就進入了LAST-ACK. /* The remote end has shut down, andthe socket is closed. Waiting foracknowledgement. 等待原來發向遠程TCP的連接中斷請求的確認 */

9)、TIME_WAIT:在主動關閉端接收到FIN後,TCP就發送ACK包,並進入TIME-WAIT狀態。/* The socket iswaiting after close tohandle packets still in the network.等待足夠的時間以確保遠程TCP接收到連接中斷請求的確認 */(主線在主動關閉端,表示收到了對方的FIN報文,並且發送出了ACK報文,等2MSL後即可回到CLOSED可用狀態了。)

10)、CLOSING: 比較少見./* Both sockets areshut down but westill don’thave all our data sent. 等待遠程TCP對連接中斷的確認 */

11)、CLOSED: 被動關閉端在接受到ACK包後,就進入了closed的狀態。連接結束./* The socket is notbeing used. 沒有任何連接狀態 */

TIME_WAIT狀態的形成只發生在主動關閉連接的一方。
主動關閉方在接收到被動關閉方的FIN請求後,發送成功給對方一個ACK後,將自己的狀態由FIN_WAIT2修改為TIME_WAIT,而必須再等2倍 的MSL(Maximum Segment Lifetime, MSL是一個數據報在internetwork中能存在的時間)時間之後雙方才能把狀態 都改為CLOSED以關閉連接。目前RHEL裡保持TIME_WAIT狀態的時間為60秒。

TCP的三次握手狀態變化:
1. Client:SYN ->Server
Client發送一個SYN到Server,此時客戶端狀態變為SYN_SENT.
2. Server: SYN + ACK –>Client
Server接收到SYN包,並發送ACK到Client,此時Server端狀態LISTEN-> SYN_RECV
3. Client:ACK -> Server
Client收到Server的SYN和ACK,此時Server端狀態:LISTEN ->SYN_RECV -> ESTABLISHED
Client端狀態SYN_SENT –>ESTABLISHED

第一次握手過程中涉及到的內核參數:

net.ipv4.tcp_syn_retries=5
· (The maximum number oftimes initial SYNs for an active TCP connection attempt will beretransmitted. This value should not be higherthan 255. The defaultvalue is 5, which corresponds to approximately180seconds.)

第二次握手涉及到的參數:


一、 在這一過程中,內核有一個用來接受client發送的SYN並對SYN進行排隊的隊列參數,如果隊列滿了,就不接受新的請求,等待最後發送ack的時候允許多少個等待,前提是有足夠內存。此參數是:

net.ipv4.tcp_max_syn_backlog
· (The maximum number of queued connectionrequests which have still not received an acknowledgement fromthe connecting client. If this number is exceeded, thekernel will begin dropping requests. The default value of 256 is increased to 1024 when the memory present in the system is adequate or greater (>= 128Mb), and reduced to 128 for thosesystems with very low memory (<= 32Mb). It isrecommended that if this needs to be increased above 1024,TCP_SYNQ_HSIZE in include/net/tcp.h be modified to keepTCP_SYNQ_HSIZE*16<=tcp_max_syn_backlog, and the kernel berecompiled.)
默認是1024,內存足夠大,高並發的服務器建議提高到net.ipv4.tcp_max_syn_backlog = 16384 .

二、 其次是SYN-ACK重傳,當Server向Client發送SYN+ACK沒有得到相應,Server將重傳,控制這個過程的參數是

tcp_synack_retries
· (The maximum number of times a SYN/ACK segment for apassive TCP connection will be retransmitted. Thisnumber should not be higher than 255.)
默認值是5,對應的時間是180秒,建議修改為
tcp_synack_retries = 1

三、 SYN Cookies 是對TCP服務器端的三次握手協議作一些修改,專門用來防范SYN Flood攻擊的一種手段。它的原理是,在TCP服務器收到TCP SYN包並返回TCPSYN+ACK包時,不分配一個專門的數據區,而是根據這個SYN包計算出一個cookie值。在收到TCPACK包時,TCP服務器在根據那個cookie值檢查這個TCP ACK包的合法性。如果合法,再分配專門的數據區進行處理未來的TCP連接。對應內核參數是:

net.ipv4.tcp_syncookies = {0|1}
· (Enable TCP syncookies. The kernel must be compiled with CONFIG_SYN_COOKIES. Send out syncookies when the syn backlog queue of a socket overflows. The syncookies featureattempts to protect a socket from a SYN flood attack. This should be used as a last resort, if at all. This is a violation of the TCP protocol, andconflicts with other areas of TCP such as TCP extensions. It can cause problems for clients and relays. It is not recommended as a tuning mechanism for heavilyloaded servers to help with overloaded or misconfigured conditions. For recommended alternatives see tcp_max_syn_backlog, tcp_synack_retries, andtcp_abort_on_overflow.)
·
tcp_syncookies 與 tcp_max_syn_backlog一起聯合使用,防止SYN Flood攻擊。


中間傳輸數據的過程中涉及到的內核參數:

net.ipv4.tcp_keepalive_intvl=15
net.ipv4.tcp_keepalive_probes=3
net.ipv4.tcp_keepalive_time=120

這三個參數是如果Server端和Client端一直沒有數據傳輸,過了120秒後,第一次探測,間隔15秒後做第二次探測,直到探測3次就放棄連接。
四次揮手的狀態變化:
客戶端(主動發起關閉):
1.Client : FIN(M) ->Server
Client發送一個FIN給Server,請求關閉,Client由ESTABLISHED -> FIN_WAIT1

2.Server : ACK ->Client
Server收到FIN後發送ACK 確認,Server有ESTABLISHED ->CLOSE_WAIT
Client收到Server的ACK,由FIN_WAIT1->FIN_WAIT2繼續等待Server發送數據

3.Server : FIN(N) ->Client
Server端狀態變為ESTABLISHED ->CLOSE_WAIT ->LAST_ACK

4.Client : ACK(N+1)->Server
Client收到FIN,狀態由ESTABLISHED->FIN_WAIT1->FIN_WAIT2->TIME_WAIT[2MSL超時]->closed
Server端變為ESTABLISHED ->CLOSE_WAIT ->LAST_ACK->CLOSED.

上面涉及到一個名詞,2MSL (Maximum Segment Lifetime )
· The TIME_WAIT state isalso called the 2MSL wait state.
· Every implementation mustchoose a value for the maximum segment lifetime (MSL). It is the maximum amount of time any segment can exist in the network before being discarded.
· RFC793 specifies the MSLas 2 minutes. Common implementation values, however, are 30seconds, 1 minute, or 2 minutes. Recall that the limit on lifetime of the IP datagram is based on the number of hops, not a timer.
· Given an MSL for animplementation, the rule is: when TCP performs an active close, and sends the final ACK, that connection must stay in the TIME_WAIT state for twice the MSL.
· This lets TCP resend thefinal ACK in case this ACK is lost (in which case the other endwill time out and retransmit its final FIN).
· An effect of this 2MSLwait is that while the TCP connection is in the 2MSL wait, thesocket pair defining that connection cannot be reused.
· Any delayed segments thatarrive for a connection while it is in the 2MSL wait are discarded. Since the connection defined by the socket pair in the 2MSL wait cannot be reused, when we do establish a valid connection we know that delayed segments from an earlier incarnation of thisconnection cannot be misinterpreted as being part of the newconnection.
· The client, who performsthe active close, enters the 2MSL wait. The server does not. Thismeans if we terminate a client, and restart the client immediately, the new client cannot reuse the same local port number.
· Servers, however, usewell-known ports. If we terminate a server that has a connectionestablished, and immediately try to restart the server, the server cannot assign its well-known port number to its end point.

簡單點理解就是,主動發送FIN的那一端最後發送了ack確認給服務器後必然經過的一個時間。TIME_WAIT(也是2MSL)狀態的目的是為了防止最後client發出的ack丟失,讓server處於LAST_ACK超時重發FIN。配置2MSL時間長短的服務器參數,我們需要的是Time_wait的連接可以重用,並且能迅速關閉。

控制迅速回收和重用的參數是:

net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=1
注意如果是LVS-NAT服務器不推薦開啟以上參數。
如果發現服務器有大量TIME_WAIT的連接,可降低tcp_fin_timeout參數(默認60),如果有這個問題出現,一般伴隨的就是本地端口被占用完畢,還需要擴大端口范圍:

net.ipv4.tcp_fin_timeout=20
· How many seconds towait fora final FIN packet before the socket is forcibly closed. This is strictly a violation of the TCP specification, but required to prevent denial-of-service (DoS) attacks. The default value in2.4 kernels is 60, down from 180 in2.2.
·
net.ipv4.ip_local_port_range=1024 65534

以及 TIME_WAIT的最大值:

net.ipv4.tcp_max_tw_buckets=20000
· The maximum number ofsockets in TIME_WAIT state allowed in the system. This limit exists only to prevent simple denial-of-service attacks. The default value of NR_FILE*2 is adjusted depending on the memory in the system. If this number isexceeded, the socket is closed and a warning is printed.
超過這個值的time_wait就被關閉掉了。

TCP緩沖參數
net.ipv4.tcp_mem='873800 8388608 8388608'

定義TCP協議棧使用的內存空間;分別為最小值,默認值和最大值;

· low:當TCP使用了低於該值的內存頁面數時,TCP不會考慮釋放內存。即低於此值沒有內存壓力。(理想情況下,這個值應與指定給 tcp_wmem 的第 2 個值相匹配- 這第 2 個值表明,最大頁面大小乘以最大並發請求數除以頁大小 (131072 * 300 / 4096)。 )
· pressure:當TCP使用了超過該值的內存頁面數量時,TCP試圖穩定其內存使用,進入pressure模式,當內存消耗低於low值時則退出pressure狀態。(理想情況下這個值應該是 TCP 可以使用的總緩沖區大小的最大值 (204800 * 300 / 4096)。 )
· high:允許所有tcpsockets用於排隊緩沖數據報的頁面量。(如果超過這個值,TCP連接將被拒絕,這就是為什麼不要令其過於保守 (512000 * 300 / 4096) 的原因了。在這種情況下,提供的價值很大,它能處理很多連接,是所預期的 2.5 倍;或者使現有連接能夠傳輸 2.5 倍的數據。)
· 一般情況下這些值是在系統啟動時根據系統內存數量計算得到的。

net.ipv4.tcp_rmem='4096 87380 8388608'
定義TCP協議棧用於接收緩沖的內存空間;
第一個值為最小值,即便當前主機內存空間吃緊,也得保證tcp協議棧至少有此大小的空間可用;
第二個值為默認值,它會覆蓋net.core.rmem_default中為所有協議定義的接收緩沖的大小;
第三值為最大值,即能用於tcp接收緩沖的最大內存空間;

net.ipv4.tcp_wmem='4096 65536 8388608'

定義TCP協議棧用於發送緩沖的內存空間;

其他的一些參數
net.ipv4.tcp_max_orphans=262144
· The maximum number oforphaned (not attached to any user file handle) TCP sockets allowed in the system. When this number is exceeded, theorphaned connection is reset and a warning is printed. This limitexists only to prevent simple denial-of-service attacks. Lowering this limit is not recommended. Network conditionsmight require you to increase the number of orphans allowed, butnote that each orphan can eat up to ~64K of unswappablememory. The default initial value is set equal to thekernel parameter NR_FILE. This initial default is adjusted depending on the memory in the system.
系統所能處理不屬於任何進程的TCPsockets最大數量。假如超過這個數量﹐那麼不屬於任何進程的連接會被立即reset,並同時顯示警告信息。之所以要設定這個限制﹐純粹為了抵御那些簡單的 DoS 攻擊﹐千萬不要依賴這個或是人為的降低這個限制。如果內存大更應該增加這個值。
系統中最多有多少個TCP套接字不被關聯到任何一個用戶文件句柄上;如果超過這個數字,孤兒連接將即刻被復位並打印出警告信息;
這個限制僅僅是為了防止簡單的DoS 攻擊,不能過分依靠它或者人為地減小這個值,如果需要修改,在確保有足夠內存可用的前提下,應該增大此值;
#這個數值越大越好,越大對於抗攻擊能力越強

在之前公司遇到的一次incident,涉及到廣告服務器backend服務器的參數,當時遇到網絡丟包,tcp table被占滿的情況,調整的相應參數(默認是65536):

net.ipv4.ip_conntrack_max= 196608
net.ipv4.netfilter.ip_conntrack_max= 196608


這兒所列參數是老男孩老師生產中常用的參數:
net.ipv4.tcp_syn_retries = 1
net.ipv4.tcp_synack_retries = 1
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl =15
net.ipv4.tcp_retries2 = 5
net.ipv4.tcp_fin_timeout = 2
net.ipv4.tcp_max_tw_buckets = 36000
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_orphans = 32768
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.tcp_wmem = 8192 131072 16777216
net.ipv4.tcp_rmem = 32768 131072 16777216
net.ipv4.tcp_mem = 786432 1048576 1572864
net.ipv4.ip_local_port_range = 1024 65000
net.ipv4.ip_conntrack_max = 65536
net.ipv4.netfilter.ip_conntrack_max=65536
net.ipv4.netfilter.ip_conntrack_tcp_timeout_established=180
net.core.somaxconn = 16384
net.core.netdev_max_backlog = 16384

內核參數的優化還是要看業務的具體應用場景和硬件參數做動態調整,這兒所列只是常用優化參數,根據參數各個定義,理解後,再根據自己生產環境而定。

Copyright © Linux教程網 All Rights Reserved