Discussion on the Meaning of PCIe Interface Replay Buffer

The advent of PCIe has enabled the generational transition of the PCI bus and replaced the AGP 8x graphics card interface. The evolution from the Intel 9xx series chipset has seen its attempts. Although the task of PCIe x16 replacing AGP 8x has been successfully achieved on desktops and laptops, it has not yet reached fruition in peripheral applications.

PCIe is a typical point-to-point connection type. Data communication is transmitted in three stages: from the application program to the data transaction layer - the data link layer - the physical layer finally arrives at the destination. The receiving end is also a hierarchical structure corresponding to it, except that the data processing is opposite to the transmitting end. This architecture can meet the interconnect applications of the chip, just like the connection between the Northbridge chip and the Southbridge chip.

PCIe height data integration ensures complete data transmission

We all know that PCIe has highly data-integrated QoS capabilities and therefore ensures the integrity of the data in transit.

On the transmitter side of the transmitter, according to the specifications to meet the ACK/NAK flow control protocol, a "Replay Buffer" must be provided. This is actually a "Retry Buffer", as the name suggests, is used to copy and store the transmitter. TLP (Transaction Layer Packet) packets sent out. Once it is determined that the data packet has been received by the receiving end, the data packet can be removed from the buffer at the receiving end. If there is a mistake in the transmission process, such as a CRC error, the receiver responds to a NAK signal. The transmitter can then re-send directly from the buffer without having to rebuild a TLP, thus ensuring point-to-point data integrity transfer.

Assuming that the transmitting device A loses the data transaction package TLP 3/4/5/6/7, these data packets are backed up and placed in the retransmission buffer. If the flow control at the receiving end sends an ACK signal to inform the transmitter that the data transaction packet TLP 3/4/5 has been received without error, the transmitter can clear the TLP 3/4/5 from the buffer.

Retransmission buffer size rough planning

If the buffer space is too small, high-speed application performance is bound to be affected; but if the space is too large, the number of IC gates (Gate Account) will become too much, it will also impact the size and cost of the chip. Therefore, the appropriate size of the re-transmission buffer space for the upstream component designers, is carefully planned and ignored.

Logically, the size of the retransmission buffer is closely related to the ACK response strategy. The PCIe specification stipulates the sending criteria of ACKs. How long does it take to respond to an ACK after receiving a TLP packet, that is, the latency value of the ACK latency. There are many factors affecting this "ACK latency", such as Link Width, Max Payload Size, and connection efficiency.

According to practical experience, the size of the retransmission buffer is designed to be about 3 to 4 times the maximum data capacity, which is a very good design criterion. This is deduced in consideration of the ACK response worst case delay.

Let's assume that node A is the transmitting end and node B is the receiving end. If node B is about to receive node A's TLP A1 packet, it will also send a "max_payload_size" TLP B1 packet. This means that node B's ACK response to TLP A1 packet must go past one more. "Time slot" of "max_payload_size", therefore, node A must keep at least two packets of "max_payload_size", and the "max_payload_size" field is located in the device control register.

In addition to considering the nuances such as the processing of ACKs, the time needed to release the retransmission buffers, and the time point of processing, it is necessary to complement a time slot so that the space for retransmit buffers needs 3 times the "max_payload_size" space. . However, in order to be prudent in design, it is recommended to use a 4x design, which will allow extra margin guarantee.

Exquisite calculation and consideration of buffer size

Since the PCIe defines the power saving mode of L0s on the power management function, the Wake Up of L0s also requires a waiting time (L0s Wake Latency), which is about 60 ns to several microseconds. If the wake-up time of L0s is too long, the ACK cannot be processed and the buffer space is released, the flow of TLP packets from the upper layer stops, and the transmission performance is naturally greatly affected. So retransmitting the buffer too small or too big is not a good design. The following will explore the refinement and consideration of buffer size.

The first step is to understand the round-trip time of a TLP packet. The variables involved in this parameter mainly include the time required for the TLP packet to be placed in the retransmission buffer, the acknowledgement time at the receiving end, and the time released from the retransmission buffer.

Since the PCIe connection is a transmission type in which the space can be extended and the 8b/10b encoding mechanism is used, the ten-bit configuration is called a symbol. An x1 connection can transmit one symbol in one symbol time, an x2 connection can transmit two symbols in one symbol time, an x8 connection can transmit 8 symbols in one symbol time, and so on.

Of course, the "max_payload_size" field has been used to indicate the data carrying capacity of the data packet. However, because PCIe is a layered processing architecture, there are overhead burdens on data packets at various layers, such as headers of data packets, and additional fragmentation information of the physical layer, which is tentatively referred to as "TLP-Overhead." Said that in the specification, this TLP-Overhead is viewed with a fixed length of 28 symbols.

After establishing this basic concept, the transmission time of the TLP packet symbols on the transmitting side can be expressed by a simple mathematical formula.

The reason why the data carrying capacity must be added with additional overhead information and divided by the link width is because Byte Striping will be used for the actual transfer.

The acknowledge ACK packet in the opposite direction from the data link layer at the receiving end is parsed from the elements that make up the architecture and is an 8-symbol packet (STP(1) + DLLP(4) + CRC(2) + END (1)). The ACK packet is an item of a DLLP packet. The time value of the response of the ACK packet symbol is also related to the connection width.

Therefore, the "ACK Latency" wait time value is not too difficult to quantify using the mathematical formula. Since "ACK Latency" refers to the time that the receiver must wait after receiving the last symbol of the TLP packet and responding to the first symbol of the ACK packet, this parameter can also be expressed mathematically.

For the AckFactor parameter in the mathematical formula, the specification is also described. It is an additional factor (Fudge Factor), which means that the maximum TLP size can be received before the ACKP DLLP packet is sent out. In order to achieve a good "balance point" between the connection bandwidth efficiency and the size of the retransmission buffer, the ideal range for this number is between 1 and 3. If this number is too large, the space for re-transmitting the buffer becomes large, and the connection efficiency is high. Similarly, if the number is too small, the space for the retransmit buffer can be saved, and the response of the ACK packet is necessarily frequent, thus sacrificing the connection efficiency. Therefore, the trade-off selection is very important.

"Internal Delay" refers to the delay of the receiving end. After receiving the TLP packet, it must be uploaded to the data link layer to pack the ACK or NAK handshake. During this time it is necessary to confirm whether the data is correct or not, update the internal buffer, etc., and regard it as a constant of 19 symbol times in the technical specifications.

In the PCIe specification, a table is made for "max_payload_size" and "Link Width", and the value of "ACK Latency" is carefully analyzed. The unit of each value in Table 1 is the Symbol Time, which is an abbreviation for "AckFactor."

PCIe adds a new mode, Active State Power Management (ASPM), to the power management function and defines the L0s's power-saving mode state. When the transmission and receiving ends recover to normal operation, they must be considered. "TxL0sAdjustment" and "RxL0sAdjustment" to describe.

"TxL0sAdjustment" can be known directly from the literal, that is, the entire time after the transmitter returns from the L0s state until it can start transmitting the TLP or DLLP. There are actually two stages in this sequence: In the first stage, the PHY transceiver on the transmitting side transmits the time of the FTS (Fast Training Sequence) instruction set. After that, Bit & Symbol Lock is set to start with the receiver. The previous period of time was defined in the PIPE specification as the transition time from P0s to P0.

The empirical value of "TxL0sAdjustment" is about 16 to 32 symbol times, with 20 symbol times as the reference value. The algorithm for this value is as follows: 4 symbol times from P0s to P0 plus 12 symbol times from three FTS instruction sets. , plus one symbol time for the SKP instruction set.

In the same way, the receiving end also has such considerations, and the description using "RxL0sAdjustment" is also a reference value of 20 symbol times.

Joe Winkles, in "Sizing of the Replay Buffer in PCI Express Devices", mentioned a fairly practical mathematical formula for designers' reference. The mathematical formula is not defined in the specification.

The "SafetyFactor" gives a margin of tolerance between 1 and 3. As for “LinkUtilizationFactor”, it refers to the degree of bandwidth usage, which of course is 100% under normal circumstances. The calculated digital unit is byte.

For example, assume that "max_payload_size" is 2048, the link width is x4, and both "SafetyFactor" and "LinkUtilizationFactor" are 1:

[20 + (2048+28)/4 + (538+20+2) × 1×4] = 4472 bytes

According to the above mathematical equation, the size of the retransmission buffer that each “max_payload_size” and “Link Width” correspond to is shown in Table 2.

In short, replay buffers are extremely important for the PCIe interface. Even if system-on-chip (SoC) and circuit IP are prevalent today, it is still necessary to understand.


USB Charger

USB Charger

The Universal Serial Bus (USB) port on computers and laptops is a data port that also serves as a 5-volt power source.Virtually all portable, personal electronics feature a USB port, though the connector can vary between one of several standards. The charger plugs into an AC outlet, and a USB Cable plugs into the charger.

05

USB Charger,30W USB C Wall Charger,Fast Charging USB C Wall Charger,Ports 60W Wall Charger Home Charger

Pogo Technology International Ltd , https://www.pogomedical.com