Case Study 2 — Converged Versus Dedicated Storage Network
This case study is similar to Case Study 1. The difference is that the server is not running into a high CPU utilization anymore. Also, the average traffic in the no-drop class is 6 Gbps (60%), and the average traffic in lossy classes is 2 Gbps (20%). When application performance degradation is reported, the duration also coincides with 100% utilization on the converged link. After doing an investigation of the per-class traffic utilization, it was found that the traffic in the lossy class spiked from 2 Gbps to 5 Gbps. At the same time, the traffic in the no-drop class dipped from 6 Gbps to 5 Gbps. This is because on this 10 GbE link, the no-drop class was allocated a bandwidth guarantee of 5 Gbps (50%). But the collective ingress rate of the lossless traffic on other ports on this switch is still 6 Gbps. To equalize this traffic to 5 Gbps, this switch invokes PFC, which results in congestion spreading. This was confirmed with the spike in Tx Pause frames on the other ports on the switch that were receiving traffic to the sent out on the edge port. 本案例研究与案例研究 1 相似。不同之处在于服务器的 CPU 利用率不再很高。此外,无损类的平均流量为 6 Gbps(60%),有损类的平均流量为 2 Gbps(20%)。当报告应用性能下降时,持续时间也与会聚链路的 100% 利用率相吻合。对每类流量利用率进行调查后发现,有损类的流量从 2 Gbps 激增到 5 Gbps。同时,无损类的流量从 6 Gbps 下降到 5 Gbps。这是因为在这条 10 GbE 链路上,无损类被分配了 5 Gbps(50%)的带宽保证。但该交换机其他端口上无损流量的总入口速率仍为 6 Gbps。为了将这些流量均衡到 5 Gbps,该交换机调用了 PFC,从而导致拥塞扩散。交换机上接收边缘端口发送流量的其他端口上的 Tx 暂停帧峰值证实了这一点。
In this case study, the clear problem is the lack of capacity on the converged link and traffic contention on that link between no-drop and lossy class. This issue was resolved by adding another 10 GbE link. 在本案例研究中,明显的问题是融合链路的容量不足,以及无损和有损类之间的流量争用。通过增加另一条 10 GbE 链路,这一问题得以解决。
In this case study, it was chosen to use both links for both types of traffic (lossy and lossless), which is the approach of the shared storage network. A valid alternative approach would have been to dedicate one link to lossy (normal class) and the other link to lossless (no-drop class) traffic, which is the approach of the dedicated storage network. 在本案例研究中,两种类型的流量(有损和无损)都使用两条链路,这是共享存储网络的方法。另一种有效的方法是将一条链路专用于有损(普通级)流量,另一条链路专用于无损(无丢包级)流量,这就是专用存储网络的方法。
The correct answer lies in the capacity of the link and the throughput that is expected on those links. A dedicated storage network is a different architecture and will require you to operate it differently. The pros are the independence of the fabrics, scalability, fault isolation, and easier troubleshooting. On the contrary, a dedicated storage network is more expensive to deploy and needs more resources to manage and operate. 正确答案在于链路的容量和这些链路的预期吞吐量。专用存储网络是一种不同的架构,需要以不同的方式进行操作。其优点在于结构的独立性、可扩展性、故障隔离和更易于故障排除。相反,专用存储网络的部署成本更高,需要更多资源来管理和运行。
Preventing Congestion in Lossless Ethernet Networks
The high-level approaches to eliminating or reducing congestion in lossless Ethernet networks are the same as the Fibre Channel fabrics. Over the decades, different transport types have implemented similar approaches with minor variations. Chapter 6 already explains the details of Preventing Congestion in Fibre Channel Fabrics. Because the hop-by-hop flow control leads to congestion spreading in both networks, the same concepts apply to lossless Ethernet networks as well, although there are implementation differences. 消除或减少无损以太网网络拥塞的高级方法与光纤通道结构相同。几十年来,不同的传输类型都采用了类似的方法,只是略有不同。第 6 章已详细介绍了在光纤通道 Fabric 中防止拥塞的方法。由于逐跳流量控制在两种网络中都会导致拥塞扩散,因此尽管在实现上存在差异,但相同的概念也适用于无损以太网网络。
Eliminating or Reducing Congestion — An Overview
Recall that a culprit is any device that causes congestion in a storage network. A victim is any device that is adversely affected by network congestion. 回想一下,"罪魁祸首 "是指造成存储网络拥塞的任何设备。受害者是受网络拥塞不利影响的任何设备。