<aside> 📘 TL;DR
本文探讨了 TCP 连接中,客户端和服务端的三种意外情况,此处以 alice & bob 来指代 TCP 的两端,假设 alice 和 bob 已经建立了 TCP 连接。
ECONNRESET
,bob 断开连接。FIN
后,bob 可以在 ACK
后不发送 FIN
,此时 alice 进入 FIN_WAIT
,只能接收不能发送,而 Bob 处于 CLOSE_WAIT
状态(此时双方状态不对等,但依然可以单向传输数据),Bob 可以继续向 Alice 发信息。直到信息的间隔时间超过 TIMEOUT
后,alice 会关闭连接。若等到 alice 已关闭连接后,bob 再继续发送信息,等于情况 1,alice 会答复 ECONNRESET
。
</aside>It's been said that we don't really understand a system until we understand how it fails. Despite having written a (toy) TCP implementation in college and then working for several years in industry, I'm continuing to learn more deeply how TCP works — and how it fails. What's been most surprising is how basic some of these failures are. They're not at all obscure. I'm presenting them here as puzzlers, in the fashion of Car Talk and the old Java puzzlers. Like the best of those puzzlers, these are questions that are very simple to articulate, but the solutions are often surprising. And rather than focusing on arcane details, they hopefully elucidate some deep principles about how TCP works.
These puzzlers assume some basic knowledge about working with TCP on Unix-like systems, but you don't have to have mastered any of this before diving in. As a refresher:
read
, write
, connect
, bind
, listen
, and accept. There's also send
and recv
, but for our purposes, these work the same way as read
and write
.poll
. Although most systems use something more efficient like kqueue
, event ports, or epoll
, these are all equivalent for our purposes. As for applications that use blocking operations instead of any of these mechanisms: once you understand how TCP failure modes affect poll, it's pretty easy to understand how it affects blocking operations as well.You can try all of these examples yourself. I used two virtual machines running under VMware Fusion. The results match my experiences in our production systems. I'm testing using the nc(1)
tool on SmartOS, and I don't believe any of the behavior shown here is OS-specific. I'm using the illumos-specific truss(1) tool to trace system calls and to get some coarse timing information. You may be able to get similar information using dtruss(1m) on OS X or strace(1) on GNU/Linux.
nc(1)
is a pretty simple tool. We'll use it in two modes:
nc
will set up a listening socket, call accept
, and block until a connection is received.nc
will create a socket and establish a connection to a remote server.In both modes, once connected, each side uses poll
to wait for either stdin or the connected socket to have data ready to be read. Incoming data is printed to the terminal. Data you type into the terminal is sent over the socket. Upon CTRL-C, the socket is closed and the process exits.
In these examples, my client is called kang
and my server is called kodos
.
This one demonstrates a very basic case just to get the ball rolling. Suppose we set up a server on kodos:
[root@kodos ~]# truss -d -t bind,listen,accept,poll,read,write nc -l -p 8080
Base time stamp: 1464310423.7650 [ Fri May 27 00:53:43 UTC 2016 ]
0.0027 bind(3, 0x08065790, 32, SOV_SOCKBSD) = 0
0.0028 listen(3, 1, SOV_DEFAULT) = 0
accept(3, 0x08047B3C, 0x08047C3C, SOV_DEFAULT, 0) (sleeping...)
(Remember, in these examples, I'm using truss
to print out the system calls that nc makes. The -d
flag prints a relative timestamp and the -t
flag selects which system calls we want to see.)
Now on kang
, I establish a connection: