TCP congestion control

Transmission Control Protocol (TCP) uses a network congestion-avoidance algorithm that includes various aspects of an additive increase/multiplicative decrease (AIMD) scheme, with other schemes such as slow-start to achieve congestion avoidance.

The TCP congestion-avoidance algorithm is the primary basis for congestion control in the Internet.[1][2][3][4][5]

Operation

To avoid congestive collapse, TCP uses a multi-faceted congestion-control strategy. For each connection, TCP maintains a congestion window, limiting the total number of unacknowledged packets that may be in transit end-to-end. This is somewhat analogous to TCP's sliding window used for flow control. TCP uses a mechanism called slow start[6] to increase the congestion window after a connection is initialized and after a timeout. It starts with a window of two times the maximum segment size (MSS). Although the initial rate is low, the rate of increase is very rapid: for every packet acknowledged, the congestion window increases by 1 MSS so that the congestion window effectively doubles for every round-trip time (RTT). When the congestion window exceeds the ssthresh threshold, the algorithm enters a new state, called congestion avoidance. In some implementations (e.g., Linux), the initial ssthresh is large, and so the first slow start usually ends after a loss. However, ssthresh is updated at the end of each slow start, and will often affect subsequent slow starts triggered by timeouts.

In congestion avoidance state as long as non-duplicate ACKs are received, the congestion window is additively increased by one MSS every round trip time. When a packet is lost, the likelihood of duplicate ACKs being received is very high (it's possible though unlikely that the stream just underwent extreme packet reordering, which would also prompt duplicate ACKs).

Congestion window

In the Transmission Control Protocol (TCP), the congestion window is one of the factors that determines the number of bytes that can be outstanding at any time. The congestion window is maintained by the sender. Note that this is not to be confused with the TCP window size which is maintained by the receiver. The congestion window is a means of stopping a link between the sender and the receiver from getting overloaded with too much traffic. It is calculated by estimating how much congestion there is between the two places.

When a connection is set up, the congestion window, a value maintained independently at each host, is set to a small multiple of the maximum segment size (MSS) allowed on that connection. Further variance in the congestion window is dictated by an Additive Increase/Multiplicative Decrease approach.

This means that if all segments are received and the acknowledgments reach the sender on time, some constant is added to the window size. The window keeps growing exponentially until a timeout occurs or the receiver reaches its limit (a threshold value "ssthresh"). After this the congestion window increases linearly at the rate of 1/(congestion window)packets on each new acknowledgement received.

On timeout:

  1. Congestion window is reset to 1 MSS
  2. "ssthresh" is set to half the congestion window size before packet loss started
  3. "slow start" is initiated.

A system administrator may adjust the maximum window size limit, or adjust the constant added during additive increase, as part of TCP tuning.

The flow of data over a TCP connection is also controlled by the use of the receiver advertised TCP Receive Window. By comparing its own congestion window with the receive window of the receiver, a sender can determine how much data it may send at any given time.

Slow start

Slow-start is part of the congestion control strategy used by TCP, the data transmission protocol used by many Internet applications. Slow-start is used in conjunction with other algorithms to avoid sending more data than the network is capable of transmitting, that is, to avoid causing network congestion. The algorithm is specified by RFC 5681.

Slow-start begins initially with a congestion window Size (cwnd) of 1, 2 or 10.[7] The value of the Congestion Window will be increased with each acknowledgement (ACK) received, effectively doubling the window size each round trip time ("although it is not exactly exponential because the receiver may delay its ACKs, typically sending one ACK for every two segments that it receives"[8]). The transmission rate will be increased with slow-start algorithm until either a loss is detected, or the receiver's advertised window (rwnd) is the limiting factor, or the slow start threshold (ssthresh) is reached. If a loss event occurs, TCP assumes that it is due to network congestion and takes steps to reduce the offered load on the network. These measurements depend on the used TCP congestion avoidance algorithm. Once ssthresh is reached, TCP changes from slow-start algorithm to the linear growth (congestion avoidance) algorithm. At this point, the window is increased by 1 segment for each RTT.

Although the strategy is referred to as "Slow-Start", its congestion window growth is quite aggressive, more aggressive than the congestion avoidance phase.[9] Before slow-start was introduced in TCP, the initial pre-congestion avoidance phase was even faster.

The behavior upon packet loss depends on the TCP congestion avoidance algorithm that is used.

TCP Tahoe
In TCP Tahoe, when a loss occurs, fast retransmit is sent, half of the current CWND is saved as a Slow Start Threshold (SSThresh) and slow start begins again from its initial CWND. Once the CWND reaches the SSThresh, TCP changes to congestion avoidance algorithm where each new ACK increases the CWND by SS + SS / CWND. This results in a linear increase of the CWND.
TCP Reno
TCP Reno implements an algorithm called Fast recovery. A fast retransmit is sent, half of the current CWND is saved as Slow Start Threshold (SSThresh) and as new CWND, thus skipping slow-start and going directly to Congestion Avoidance algorithm

Slow-start assumes that unacknowledged segments are due to network congestion. While this is an acceptable assumption for many networks, segments may be lost for other reasons, such as poor data link layer transmission quality. Thus, slow-start can perform poorly in situations with poor reception, such as wireless networks.

The slow-start protocol performs badly for short-lived connections. Older web browsers would create many consecutive short-lived connections to the web server, and would open and close the connection for each file requested. This kept most connections in the slow start mode, which resulted in poor response time. To avoid this problem, modern browsers either open multiple connections simultaneously or reuse one connection for all files requested from a particular web server. However, connections cannot be reused for the multiple third-party servers used by web sites to implement web advertising, sharing features of social networking services,[10] and counter scripts of web analytics.

Additive increase/multiplicative decrease

The additive increase/multiplicative decrease (AIMD) algorithm is a feedback control algorithm. AIMD combines linear growth of the congestion window with an exponential reduction when a congestion takes place. Multiple flows using AIMD congestion control will eventually converge to use equal amounts of a contended link.[11]

Fast retransmit

Fast Retransmit is an enhancement to TCP which reduces the time a sender waits before retransmitting a lost segment.

A TCP sender uses a timer to recognize lost segments. If an acknowledgement is not received for a particular segment within a specified time (a function of the estimated Round-trip delay time), the sender will assume the segment was lost in the network, and will retransmit the segment.

Duplicate acknowledgement is the basis for the fast retransmit mechanism which works as follows: after receiving a packet (e.g. with sequence number 1), the receiver sends an acknowledgement by adding 1 to the sequence number (i.e., acknowledgement number 2) which means that the receiver receives the packet number 1 and it expects packet number 2 from the sender. Let's assume that three subsequent packets have been lost. In the meantime the receiver receives packet numbers 5 and 6. After receiving packet number 5, the receiver sends an acknowledgement, but still only for sequence number 2. When the receiver receives packet number 6, it sends yet another acknowledgement value of 2. Because the sender receives more than one acknowledgement with the same sequence number (2 in this example) this is called duplicate acknowledgement.

The fast retransmit enhancement works as follows: if a TCP sender receives a specified number of acknowledgements which is usually set to three duplicate acknowledgements with the same acknowledge number (that is, a total of four acknowledgements with the same acknowledgement number), the sender can be reasonably confident that the segment with the next higher sequence number was dropped, and will not arrive out of order. The sender will then retransmit the packet that was presumed dropped before waiting for its timeout.

Algorithms

The "TCP Foo" names for the algorithms appear to have originated in a 1996 paper by Kevin Fall and Sally Floyd.[12]

The following is one possible classification according to the following properties:

  1. The type and amount of feedback received from the network
  2. Incremental deployability on the current Internet
  3. The aspect of performance it aims to improve: high bandwidth-delay product networks (B); lossy links (L); fairness (F); advantage to short flows (S); variable-rate links (V); speed of convergence (C)
  4. The fairness criterion it uses

Some well-known congestion avoidance mechanisms are classified by this scheme as follows:

Variant Feedback Required changes Benefits Fairness
(New)Reno Loss - - Delay
Vegas Delay Sender Less loss Proportional
High Speed Loss Sender High bandwidth
BIC Loss Sender High bandwidth
CUBIC Loss Sender High bandwidth
H-TCP Loss Sender High bandwidth
FAST Delay Sender High bandwidth Proportional
Compound TCP Loss/Delay Sender High bandwidth Proportional
Westwood Loss/Delay Sender L
Jersey Loss/Delay Sender L
CLAMP Multi-bit signal Receiver, Routers V Max-min
TFRC Loss Sender, Receiver No Retransmission Minimum delay
XCP Multi-bit signal Sender, Receiver, Router BLFC Max-min
VCP 2 bit signal Sender, Receiver, Router BLF Proportional
MaxNet Multi-bit signal Sender, Receiver, Router BLFSC Max-min
JetMax Multi-bit signal Sender, Receiver, Router High bandwidth Max-min
RED Loss Router Smaller delay
ECN Single-bit signal Sender, Receiver, Router Less loss

TCP Tahoe and Reno

The two algorithms were retrospectively named after the 4.3BSD operating system in which each first appeared (which were themselves named after Lake Tahoe and the nearby city of Reno, Nevada). The "Tahoe" algorithm first appeared in 4.3BSD-Tahoe (which was made to support the CCI Power 6/32 Tahoe minicomputer), and was made available to non-AT&T licensees as part of the “4.3BSD Networking Release 1”; this ensured its wide distribution and implementation. Improvements were made in 4.3BSD-Reno and subsequently released to the public as "Networking Release 2" and later 4.4BSD-Lite.

While both consider retransmission timeout (RTO) and duplicate ACKs as packet loss events, the behavior of Tahoe and Reno differ primarily in how they react to duplicate ACKs:

In both Tahoe and Reno, if an ACK times out (RTO timeout), slow start is used, and both algorithms reduce congestion window to 1 MSS.

Fast Recovery (Reno only): In this state, TCP retransmits the missing packet that was signaled by three duplicate ACKs, and waits for an acknowledgment of the entire transmit window before returning to congestion avoidance. If there is no acknowledgment, TCP Reno experiences a timeout and enters the slow-start state.

TCP Vegas

Main article: TCP Vegas

Until the mid-1990s, all of TCP's set timeouts and measured round-trip delays were based upon only the last transmitted packet in the transmit buffer. University of Arizona researchers Larry Peterson and Lawrence Brakmo introduced TCP Vegas, named after the largest Nevada city, in which timeouts were set and round-trip delays were measured for every packet in the transmit buffer. In addition, TCP Vegas uses additive increases in the congestion window. This variant was not widely deployed outside Peterson's laboratory. In a comparison study of various TCP congestion control algorithms, TCP Vegas appeared to be the smoothest followed by TCP CUBIC.[14]

TCP Vegas was deployed as the default congestion control method for DD-WRT firmware v24 SP2.[15]

TCP New Reno

TCP New Reno, defined by RFC 6582 (which obsoletes previous definitions in RFC 3782 and RFC 2582), improves retransmission during the fast-recovery phase of TCP Reno. During fast recovery, for every duplicate ACK that is returned to TCP New Reno, a new unsent packet from the end of the congestion window is sent, to keep the transmit window full. For every ACK that makes partial progress in the sequence space, the sender assumes that the ACK points to a new hole, and the next packet beyond the ACKed sequence number is sent.

Because the timeout timer is reset whenever there is progress in the transmit buffer, this allows New Reno to fill large holes, or multiple holes, in the sequence space – much like TCP SACK. Because New Reno can send new packets at the end of the congestion window during fast recovery, high throughput is maintained during the hole-filling process, even when there are multiple holes, of multiple packets each. When TCP enters fast recovery it records the highest outstanding unacknowledged packet sequence number. When this sequence number is acknowledged, TCP returns to the congestion avoidance state.

A problem occurs with New Reno when there are no packet losses but instead, packets are reordered by more than 3 packet sequence numbers. When this happens, New Reno mistakenly enters fast recovery, but when the reordered packet is delivered, ACK sequence-number progress occurs and from there until the end of fast recovery, every bit of sequence-number progress produces a duplicate and needless retransmission that is immediately ACKed.

New Reno performs as well as SACK at low packet error rates, and substantially outperforms Reno at high error rates.

TCP Hybla

TCP Hybla[16] aims to eliminate penalization of TCP connections that incorporate a high-latency terrestrial or satellite radio link, due to their longer round trip times. It stems from an analytical evaluation of the congestion window dynamics, which suggests the necessary modifications to remove the performance dependence on RTT.

TCP BIC

Main article: BIC TCP

Binary Increase Congestion control is an implementation of TCP with an optimized congestion control algorithm for high speed networks with high latency (called LFN, long fat networks, in RFC 1072[17]). BIC is used by default in Linux kernels 2.6.8 through 2.6.18.

TCP CUBIC

Main article: CUBIC TCP

CUBIC is a less aggressive and more systematic derivative of BIC, in which the window is a cubic function of time since the last congestion event, with the inflection point set to the window prior to the event. CUBIC is used by default in Linux kernels from version 2.6.19 to 3.1.

Agile-SD TCP

Agile-SD is a Linux-based Congestion Control Algorithm (CCA) which is designed for the real Linux kernel. It is a receiver-side algorithm employs a loss-based approach using a novel mechanism, called Agility Factor (AF). It has been proposed by Mohamed A. Alrshah et. al. to increase the bandwidth utilization over high-speed and short-distance networks (low-BDP networks) such as local area networks or fiber-optic network, especially when the applied buffer size is small. It has been evaluated by comparing its performance to Compound-TCP (the default CCA in MS Windows) and CUBIC (the default of Linux) using NS-2 simulator. It improves the total performance up to 55% in term of average throughput.

TCP Westwood+

Main article: TCP Westwood plus

Westwood+ is a sender-side only modification of the TCP Reno protocol stack that optimizes the performance of TCP congestion control over both wireline and wireless networks. TCP Westwood+ is based on end-to-end bandwidth estimation to set congestion window and slow start threshold after a congestion episode, that is, after three duplicate acknowledgments or a timeout. The bandwidth is estimated by properly low-pass filtering the rate of returning acknowledgment packets. The rationale of this strategy is simple: in contrast with TCP Reno, which blindly halves the congestion window after three duplicate ACKs, TCP Westwood+ adaptively sets a slow start threshold and a congestion window which takes into account the bandwidth used at the time congestion is experienced. TCP Westwood+ significantly increases throughput over wireless links and fairness compared to TCP Reno/New Reno in wired networks.

Compound TCP

Main article: Compound TCP

Compound TCP is a Microsoft implementation of TCP which maintains two different congestion windows simultaneously, with the goal of achieving good performance on LFNs while not impairing fairness. It has been widely deployed in Windows versions since Microsoft Windows Vista and Windows Server 2008 and has been ported to older Microsoft Windows versions as well as Linux.

TCP Proportional Rate Reduction

TCP Proportional Rate Reduction (PRR)[18] is an algorithm designed to improve the accuracy of data sent during recovery. The algorithm ensures that the window size after recovery is as close as possible to the slow-start threshold. In tests performed by Google, PRR resulted in a 3–10% reduction in average latency and recovery timeouts reduced by 5%.[19] PRR is used by default in Linux kernels since version 3.2.[20]

Other TCP congestion avoidance algorithms

TCP New Reno was the most commonly implemented algorithm, SACK support is very common and is an extension to Reno/New Reno. Most others are competing proposals which still need evaluation. Starting with 2.6.8 the Linux kernel switched the default implementation from New Reno to BIC. The default implementation was again changed to CUBIC in the 2.6.19 version. FreeBSD uses New Reno as the default algorithm. However, it supports a number of other choices.[29]

When the per-flow product of bandwidth and latency increases, regardless of the queuing scheme, TCP becomes inefficient and prone to instability. This becomes increasingly important as the Internet evolves to incorporate very high-bandwidth optical links.

TCP Interactive (iTCP)[30] allows applications to subscribe to TCP events and respond accordingly enabling various functional extensions to TCP from outside TCP layer. Most TCP congestion schemes work internally. iTCP additionally enables advanced applications to directly participate in congestion control such as to control the source generation rate.

Zeta-TCP detects the congestions from both the latency and loss rate measures, and applies different congestion window backoff strategies based on the likelihood of the congestions to maximize the goodput. It also has a couple of other improvements to accurately detect the packet losses, avoiding retransmission timeout retransmission; and accelerate/control the inbound (download) traffic.

Classification by network awareness

Congestion control algorithms can be categorized using network awareness as a criterion. The first category (”the box is black”) consists of a group of algorithms that consider the network as a black box, assuming no knowledge of its state, other than the binary feedback upon congestion. The second category (”the box is grey”) groups approaches that use measurements to estimate available bandwidth, level of contention or even the temporary characteristics of congestion. Due to the possibility of wrong estimations and measurements, the network is considered a grey box. The third category (”the box is green”) contains the bimodal congestion control, which calculates explicitly the fairshare, as well as the network-assisted control, where the network communicates its state to the transport layer; the box now is becoming green.

Black box

The Additive Increase Multiplicative Decrease (AIMD) algorithm is used to implement TCP window adjustments; based on the analysis of Chiu and Jain the algorithm achieves stability and converges to fairness in situations where the demand (of competing flows) exceeds the channel’s bandwidth. The congestion control in the traditional TCP, is based on the basic idea of AIMD. In TCP-Tahoe, TCP-NewReno and TCP-Sack, the additive increase phase is adopted exactly as in AIMD, when the protocols are in the congestion avoidance phase. In case of a packet drop, instead of the multiplicative decrease a more conservative tactic is used in TCP-Tahoe. The congestion window resets and the protocol enters again the slow-start phase. On the other hand, in TCP-NewReno and TCP-Sack, when the sender receives 3 DACKs, a multiplicative decrease is used in both window and slow-start threshold. In such case, the protocols remain at the Congestion Avoidance phase. When the retransmission timeout expires, they enter the slow-start phase as in TCP-Tahoe.

Grey box

Standard TCP relies on packet losses as an implicit congestion signal from overloaded links. However, packet loss is not a sufficient indication of congestion, in its own right, for a number of reasons: 1) Packet loss can be caused by random bit corruption when bandwidth is still available. 2) Acknowledgement-based loss detection at the sender side can be affected by the cross-traffic on the reverse path. 3) Packet loss, as a binary feedback, cannot indicate the level of contention before the occurrence of congestion. Therefore, an efficient window adjustment tactic should reflect various network conditions, which cannot all be captured simply by packet drops. Several measurement-based transport protocols gather information on current network conditions.

Green box

The following algorithms require custom fields to be added to the TCP packet structure.

Usage

See also

References

  1. Van Jacobson, Michael J. Karels. Congestion Avoidance and Control (1988). Proceedings of the Sigcomm '88 Symposium, vol.18(4): pp.314–329. Stanford, CA. August 1988. This paper originated many of the congestion avoidance algorithms used in TCP/IP.
  2. RFC 2001 – TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms
  3. RFC 5681 – TCP Congestion Control
  4. RFC 3390 – TCP Increasing TCP's Initial Window
  5. TCP Congestion Avoidance Explained via a Sequence Diagram
  6. Jacobson, Van; Karels, Michael (1988). "Congestion Avoidance and Control" (PDF). ACM SIGCOMM Computer Communication Review 25 (1): 157–187. doi:10.1145/205447.205462.
  7. Corbet, Jonathan. "Increasing the TCP initial congestion window". LWN. Retrieved 10 October 2012.
  8. "TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms". January 1997.
  9. Jacobson, Van; Karels, MJ (1988). "Congestion avoidance and control" (PDF). ACM SIGCOMM Computer Communication Review 18 (4): 314–329. doi:10.1145/52325.52356.
  10. Nick O'Neill. "What's Making Your Site Go Slow? Could Be The Like Button". AllFacebook, November 10, 2010. Retrieved on September 12, 2012.
  11. Chiu, Dah-Ming; Raj Jain (1989). "Analysis of increase and decrease algorithms for congestion avoidance in computer networks". Computer Networks and ISDN systems 17: 1–14.
  12. Fall, Kevin; Sally Floyd (July 1996). "Simulation-based Comparisons of Tahoe, Reno and SACK TCP" (PostScript). Computer Communications Review.
  13. 1 2 Kurose & Ross 2008, p. 284.
  14. "Performance Analysis of TCP Congestion Control Algorithms" (PDF). Retrieved 26 March 2012.
  15. "DD-WRT changelog". Retrieved 2 January 2012.
  16. http://hybla.deis.unibo.it/
  17. "RFC 6937 - Proportional Rate Reduction for TCP". Retrieved 6 June 2014.
  18. Corbet, Jonathan. "LPC: Making the net go faster". Retrieved 6 June 2014.
  19. "Linux 3.2 - Linux Kernel Newbies". Retrieved 6 June 2014.
  20. Yuan, Cao; Tan, Liansheng; Andrew, Lachlan L. H.; Zhang, Wei; Zukerman, Moshe (2016 03 30). "A generalized FAST TCP scheme". Computer Communications 31 (14): 3242-3249. doi:10.1016/j.comcom.2008.05.028. Check date values in: |date=, |year= / |date= mismatch (help)
  21. 1 2 "Rice Networks Group".
  22. http://www.deneholme.net/tom/scalable/
  23. "XCP @ ISI".
  24. http://wil.cs.caltech.edu/pfldnet2007/paper/YeAH_TCP.pdf
  25. http://media.cs.tsinghua.edu.cn/~multimedia/tcp-fit/
  26. "An analytical study of CANIT algorithm in TCP protocol". doi:10.1145/605521.605530.
  27. "Error:".
  28. "Summary of Five New TCP Congestion Control Algorithms Project".
  29. "iTCP - Interactive Transport Protocol - Medianet Lab, Kent State University".
  30. Highspeed-TCP Homepage
  31. AIMD-FC Homepage
  32. TCP-Westwood Homepage
  33. TFRC Homepage
  34. MaxNet Homepage

Sources

External links

This article is issued from Wikipedia - version of the Wednesday, May 04, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.