RDMA在KVM实现条件

发布时间:2024年01月24日

KVM 支持VF passthrough条件

CPU必须支持 Intel VT-d 或 AMD-Vi(IOMMU)技术

demsg要包含下述两部分

  • DMAR: Intel(R) Virtualization Technology for Directed I/O
  • DMAR: IOMMU enabled

检查CPU是否支持VT-d或AMD-Vi

# dmesg |grep -e?"DMAR"?-e?"IOMMU"|grep -e?"Virtualization"?-e enabled

[????0.000000] DMAR: IOMMU enabled

[????0.001068] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.

[????1.150702] DMAR: Intel(R) Virtualization Technology?for?Directed I/O

内核必须支持vfio,?vfio_iommu_type1,?vfio_pci?等模块

检查Kernel加载 IOMMU 相关的内核模块

[root@stgExt1?qemu]# lsmod|grep -e vfio -e iommu

vfio_pci???????????????61440??0

vfio_virqfd????????????16384??1?vfio_pci

vfio_iommu_type1???????36864??0

vfio???????????????????36864??2?vfio_iommu_type1,vfio_pci

irqbypass??????????????16384??422?vfio_pci,kvm

QEMU必须2.0版本以上

centos8.4自带qemu版本4.2.0,BVT环境已升级至8.0.2,且QEMU需要重新编译

configure ./

./configure --prefix=/usr/local/qemu_rdma/ --enable-debug --enable-kvm --enable-vnc --target-list=x86_64-softmmu --enable-spice --enable-spice-protocol --enable-vnc --enable-usb-redir --enable-rdma

QEMU替换步骤

example

ln -sf /usr/local/qemu_rdma/bin/qemu-system-x86_64 /usr/libexec/qemu-kvm

setenforce?0

libvirt 版本是 1.2.9 或更高版本

centos8.4自带libvirt 版本为6.0.0

KVM支持SR-IOV

我们把SR-IOV创建出的虚拟网卡称为VF,如下命令可以查看网卡物理端口ens4f0/1(称PF)最大支持创建的VF均为8个;

KVM支持SR-IOV

我们把SR-IOV创建出的虚拟网卡称为VF,如下命令可以查看网卡物理端口ens4f0/1(称PF)最大支持创建的VF均为8个;

# cat /sys/class/net/ens4f0/device/sriov_totalvfs

8

# cat /sys/class/net/ens4f1/device/sriov_totalvfs

8

ens4f0单个网口虚拟出6个VF

# echo?6?> /sys/class/net/ens4f0/device/sriov_numvfs

# lspci|grep Mellanox

b1:00.0?Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6?Lx]

b1:00.1?Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6?Lx]

b1:00.2?Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function

b1:00.3?Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function

b1:00.4?Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function

b1:00.5?Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function

b1:00.6?Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function

b1:00.7?Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function

# ip link |grep ens4

261: ens4f0v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu?1500?qdisc mq state UP mode DEFAULT group?default?qlen?1000

262: ens4f0v1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu?1500?qdisc mq state UP mode DEFAULT group?default?qlen?1000

263: ens4f0v2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu?1500?qdisc mq state UP mode DEFAULT group?default?qlen?1000

264: ens4f0v3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu?1500?qdisc mq state UP mode DEFAULT group?default?qlen?1000

265: ens4f0v4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu?1500?qdisc mq state UP mode DEFAULT group?default?qlen?1000

266: ens4f0v5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu?1500?qdisc mq state UP mode DEFAULT group?default?qlen?1000

18: ens4f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu?1500?qdisc mq state UP mode DEFAULT group?default?qlen?1000

19: ens4f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu?1500?qdisc mq state UP mode DEFAULT group?default?qlen?1000

# ip link show ens4f0v0

261: ens4f0v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu?1500?qdisc mq state UP mode DEFAULT group?default?qlen?1000

????link/ether?56:ba:79:b5:fb:3a brd ff:ff:ff:ff:ff:ff

[root@stgExt1?qemu]# ip link show ens4f0v1

262: ens4f0v1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu?1500?qdisc mq state UP mode DEFAULT group?default?qlen?1000

????link/ether?42:f9:c8:62:be:fd brd ff:ff:ff:ff:ff:ff

[root@stgExt1?qemu]# ip link show ens4f0v2

263: ens4f0v2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu?1500?qdisc mq state UP mode DEFAULT group?default?qlen?1000

????link/ether 2e:2b:21:22:a7:da brd ff:ff:ff:ff:ff:ff

[root@stgExt1?qemu]# ip link show ens4f0v3

264: ens4f0v3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu?1500?qdisc mq state UP mode DEFAULT group?default?qlen?1000

????link/ether?22:cd:f8:8e:8b:39?brd ff:ff:ff:ff:ff:ff

[root@stgExt1?qemu]# ip link show ens4f0v4

265: ens4f0v4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu?1500?qdisc mq state UP mode DEFAULT group?default?qlen?1000

????link/ether b6:b1:22:d5:28:46?brd ff:ff:ff:ff:ff:ff

[root@stgExt1?qemu]# ip link show ens4f0v5

266: ens4f0v5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu?1500?qdisc mq state UP mode DEFAULT group?default?qlen?1000

????link/ether be:64:4f:36:e0:f7 brd ff:ff:ff:ff:ff:ff

lspci命令行输出

# lspci -nn |grep Mellanox

b1:00.0?Ethernet controller [0200]: Mellanox Technologies MT2894 Family [ConnectX-6?Lx] [15b3:101f]

b1:00.1?Ethernet controller [0200]: Mellanox Technologies MT2894 Family [ConnectX-6?Lx] [15b3:101f]

b1:00.2?Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e]

b1:00.3?Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e]

b1:00.4?Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e]

b1:00.5?Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e]

b1:00.6?Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e]

b1:00.7?Ethernet controller [0200]: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function [15b3:101e]

永久生效还需要

创建文件?/etc/modprobe.d/mlx5.conf,并添加以下内容:

cat /etc/modprobe.d/mlx5.conf

options mlx5_core num_vfs=2

为VF接口创建一个udev 规则/etc/udev/rules.d/ens4f0.rules, 使创建的VF持久化

cat /etc/udev/rules.d/ens4f0.rules

ACTION=="add", SUBSYSTEM=="net", DRIVERS=="mlx5_core", ATTR{device/sriov_numvfs}="8"

重新加载 mlx5_core 内核模块以使配置生效:

modprobe -r mlx5_core && modprobe mlx5_core

$ modprobe -r mlx5_core && modprobe mlx5_core

保存生效后,可以查看到VF,例如:

$ ip link show

$ ip link show

查看RDMA链接状态

$ ip link show

$ rdma link show

0/1: mlx5_0/1: state ACTIVE physical_state LINK_UP netdev ens1f0np0

1/1: mlx5_1/1: state ACTIVE physical_state LINK_UP netdev ens1f1np1

网口下层Link Layer: Ethernet表示RoCE协议

# ibstat

CA?'mlx5_0'

????CA type: MT4123

????Number of ports:?1

????Firmware version:?20.30.1004

????Hardware version:?0

????Node GUID:?0xb83fd20300d3e4c6

????System image GUID:?0xb83fd20300d3e4c6

????Port?1:

????????State: Active

????????Physical state: LinkUp

????????Rate:?100

????????Base lid:?0

????????LMC:?0

????????SM lid:?0

????????Capability mask:?0x00010000

????????Port GUID:?0xba3fd2fffed3e4c6

????????Link layer: Ethernet

CA?'mlx5_1'

????CA type: MT4123

????Number of ports:?1

????Firmware version:?20.30.1004

????Hardware version:?0

????Node GUID:?0xb83fd20300d3e4c7

????System image GUID:?0xb83fd20300d3e4c6

????Port?1:

????????State: Active

????????Physical state: LinkUp

????????Rate:?100

????????Base lid:?0

????????LMC:?0

????????SM lid:?0

????????Capability mask:?0x00010000

????????Port GUID:?0xba3fd2fffed3e4c7

????????Link layer: Ethernet

ibv_devinfo -v?的输出中,每个网络接口都可能包含多个 GID(Global Identifier),每个 GID 表示一个全局唯一标识符,用于唯一标识 InfiniBand 网络中的节点或端口。其中,每个 GID 都会指定一个协议版本,如 RoCE v1 或 RoCE v2。

在?ibv_devinfo -v?命令的输出中

  • 如果看到?transport: Ethernet,则表示使用以太网协议;
  • 如果同时看到?RoCE v1?或?RoCE v2,则说明使用了 RoCE 协议;

ibv_devinfo -v |grep GID

# ibv_devinfo -v

hca_id: mlx5_0

????transport:????????? InfiniBand (0)

????fw_ver:?????????????20.30.1004

????node_guid:????????? b83f:d203:00d3:e4c6

????sys_image_guid:???????? b83f:d203:00d3:e4c6

????vendor_id:??????????0x02c9

????vendor_part_id:?????????4123

????hw_ver:?????????????0x0

????board_id:?????????? LNV0000000017

????phys_port_cnt:??????????1

????max_mr_size:????????????0xffffffffffffffff

????page_size_cap:??????????0xfffffffffffff000

????max_qp:?????????????262144

????max_qp_wr:??????????32768

????device_cap_flags:???????0x25321c36

????????????????????BAD_PKEY_CNTR

????????????????????BAD_QKEY_CNTR

????????????????????AUTO_PATH_MIG

????????????????????CHANGE_PHY_PORT

????????????????????PORT_ACTIVE_EVENT

????????????????????SYS_IMAGE_GUID

????????????????????RC_RNR_NAK_GEN

????????????????????MEM_WINDOW

????????????????????XRC

????????????????????MEM_MGT_EXTENSIONS

????????????????????MEM_WINDOW_TYPE_2B

????????????????????RAW_IP_CSUM

????????????????????MANAGED_FLOW_STEERING

????max_sge:????????????30

????max_sge_rd:?????????30

????max_cq:?????????????16777216

????max_cqe:????????????4194303

????max_mr:?????????????16777216

????max_pd:?????????????8388608

????max_qp_rd_atom:?????????16

????max_ee_rd_atom:?????????0

????max_res_rd_atom:????????4194304

????max_qp_init_rd_atom:????????16

????max_ee_init_rd_atom:????????0

????atomic_cap:???????? ATOMIC_HCA (1)

????max_ee:?????????????0

????max_rdd:????????????0

????max_mw:?????????????16777216

????max_raw_ipv6_qp:????????0

????max_raw_ethy_qp:????????0

????max_mcast_grp:??????????2097152

????max_mcast_qp_attach:????????240

????max_total_mcast_qp_attach:??503316480

????max_ah:?????????????2147483647

????max_fmr:????????????0

????max_srq:????????????8388608

????max_srq_wr:?????????32767

????max_srq_sge:????????????31

????max_pkeys:??????????128

????local_ca_ack_delay:?????16

????general_odp_caps:

????????????????????ODP_SUPPORT

????????????????????ODP_SUPPORT_IMPLICIT

????rc_odp_caps:

????????????????????SUPPORT_SEND

????????????????????SUPPORT_RECV

????????????????????SUPPORT_WRITE

????????????????????SUPPORT_READ

????????????????????SUPPORT_SRQ

????uc_odp_caps:

????????????????????NO SUPPORT

????ud_odp_caps:

????????????????????SUPPORT_SEND

????xrc_odp_caps:

????????????????????SUPPORT_SEND

????????????????????SUPPORT_WRITE

????????????????????SUPPORT_READ

????????????????????SUPPORT_SRQ

????completion timestamp_mask:??????????0x7fffffffffffffff

????hca_core_clock:???????? 156250kHZ

????raw packet caps:

????????????????????C-VLAN stripping offload

????????????????????Scatter FCS offload

????????????????????IP csum offload

????????????????????Delay drop

????device_cap_flags_ex:????????0x3000005425321C36

????????????????????RAW_SCATTER_FCS

????????????????????PCI_WRITE_END_PADDING

????????????????????Unknown flags:?0x3000004000000000

????tso_caps:

????????max_tso:????????????262144

????????supported_qp:

????????????????????SUPPORT_RAW_PACKET

????rss_caps:

????????max_rwq_indirection_tables:?????????1048576

????????max_rwq_indirection_table_size:?????????2048

????????rx_hash_function:???????????????0x1

????????rx_hash_fields_mask:????????????????0x800000FF

????????supported_qp:

????????????????????SUPPORT_RAW_PACKET

????max_wq_type_rq:?????????8388608

????packet_pacing_caps:

????????qp_rate_limit_min:? 1kbps

????????qp_rate_limit_max:? 100000000kbps

????????supported_qp:

????????????????????SUPPORT_RAW_PACKET

????tag matching not supported

????cq moderation caps:

????????max_cq_count:???65535

????????max_cq_period:??4095?us

????maximum available device memory:??? 131072Bytes

????num_comp_vectors:???????63

????????port:???1

????????????state:????????? PORT_ACTIVE (4)

????????????max_mtu:????????4096?(5)

????????????active_mtu:?????1024?(3)

????????????sm_lid:?????????0

????????????port_lid:???????0

????????????port_lmc:???????0x00

????????????link_layer:???? Ethernet

????????????max_msg_sz:?????0x40000000

????????????port_cap_flags:?????0x04010000

????????????port_cap_flags2:????0x0000

????????????max_vl_num:???? invalid value (0)

????????????bad_pkey_cntr:??????0x0

????????????qkey_viol_cntr:?????0x0

????????????sm_sl:??????????0

????????????pkey_tbl_len:???????1

????????????gid_tbl_len:????????255

????????????subnet_timeout:?????0

????????????init_type_reply:????0

????????????active_width:?????? 4X (2)

????????????active_speed:???????25.0?Gbps (32)

????????????phys_state:???? LINK_UP (5)

????????????GID[??0]:?????? fe80:0000:0000:0000:ba3f:d2ff:fed3:e4c6, RoCE v1

????????????GID[??1]:?????? fe80::ba3f:d2ff:fed3:e4c6, RoCE v2

hca_id: mlx5_1

????transport:????????? InfiniBand (0)

????fw_ver:?????????????20.30.1004

????node_guid:????????? b83f:d203:00d3:e4c7

????sys_image_guid:???????? b83f:d203:00d3:e4c6

????vendor_id:??????????0x02c9

????vendor_part_id:?????????4123

????hw_ver:?????????????0x0

????board_id:?????????? LNV0000000017

????phys_port_cnt:??????????1

????max_mr_size:????????????0xffffffffffffffff

????page_size_cap:??????????0xfffffffffffff000

????max_qp:?????????????262144

????max_qp_wr:??????????32768

????device_cap_flags:???????0x25321c36

????????????????????BAD_PKEY_CNTR

????????????????????BAD_QKEY_CNTR

????????????????????AUTO_PATH_MIG

????????????????????CHANGE_PHY_PORT

????????????????????PORT_ACTIVE_EVENT

????????????????????SYS_IMAGE_GUID

????????????????????RC_RNR_NAK_GEN

????????????????????MEM_WINDOW

????????????????????XRC

????????????????????MEM_MGT_EXTENSIONS

????????????????????MEM_WINDOW_TYPE_2B

????????????????????RAW_IP_CSUM

????????????????????MANAGED_FLOW_STEERING

????max_sge:????????????30

????max_sge_rd:?????????30

????max_cq:?????????????16777216

????max_cqe:????????????4194303

????max_mr:?????????????16777216

????max_pd:?????????????8388608

????max_qp_rd_atom:?????????16

????max_ee_rd_atom:?????????0

????max_res_rd_atom:????????4194304

????max_qp_init_rd_atom:????????16

????max_ee_init_rd_atom:????????0

????atomic_cap:???????? ATOMIC_HCA (1)

????max_ee:?????????????0

????max_rdd:????????????0

????max_mw:?????????????16777216

????max_raw_ipv6_qp:????????0

????max_raw_ethy_qp:????????0

????max_mcast_grp:??????????2097152

????max_mcast_qp_attach:????????240

????max_total_mcast_qp_attach:??503316480

????max_ah:?????????????2147483647

????max_fmr:????????????0

????max_srq:????????????8388608

????max_srq_wr:?????????32767

????max_srq_sge:????????????31

????max_pkeys:??????????128

????local_ca_ack_delay:?????16

????general_odp_caps:

????????????????????ODP_SUPPORT

????????????????????ODP_SUPPORT_IMPLICIT

????rc_odp_caps:

????????????????????SUPPORT_SEND

????????????????????SUPPORT_RECV

????????????????????SUPPORT_WRITE

????????????????????SUPPORT_READ

????????????????????SUPPORT_SRQ

????uc_odp_caps:

????????????????????NO SUPPORT

????ud_odp_caps:

????????????????????SUPPORT_SEND

????xrc_odp_caps:

????????????????????SUPPORT_SEND

????????????????????SUPPORT_WRITE

????????????????????SUPPORT_READ

????????????????????SUPPORT_SRQ

????completion timestamp_mask:??????????0x7fffffffffffffff

????hca_core_clock:???????? 156250kHZ

????raw packet caps:

????????????????????C-VLAN stripping offload

????????????????????Scatter FCS offload

????????????????????IP csum offload

????????????????????Delay drop

????device_cap_flags_ex:????????0x3000005425321C36

????????????????????RAW_SCATTER_FCS

????????????????????PCI_WRITE_END_PADDING

????????????????????Unknown flags:?0x3000004000000000

????tso_caps:

????????max_tso:????????????262144

????????supported_qp:

????????????????????SUPPORT_RAW_PACKET

????rss_caps:

????????max_rwq_indirection_tables:?????????1048576

????????max_rwq_indirection_table_size:?????????2048

????????rx_hash_function:???????????????0x1

????????rx_hash_fields_mask:????????????????0x800000FF

????????supported_qp:

????????????????????SUPPORT_RAW_PACKET

????max_wq_type_rq:?????????8388608

????packet_pacing_caps:

????????qp_rate_limit_min:? 1kbps

????????qp_rate_limit_max:? 100000000kbps

????????supported_qp:

????????????????????SUPPORT_RAW_PACKET

????tag matching not supported

????cq moderation caps:

????????max_cq_count:???65535

????????max_cq_period:??4095?us

????maximum available device memory:??? 131072Bytes

????num_comp_vectors:???????63

????????port:???1

????????????state:????????? PORT_ACTIVE (4)

????????????max_mtu:????????4096?(5)

????????????active_mtu:?????1024?(3)

????????????sm_lid:?????????0

????????????port_lid:???????0

????????????port_lmc:???????0x00

????????????link_layer:???? Ethernet

????????????max_msg_sz:?????0x40000000

????????????port_cap_flags:?????0x04010000

????????????port_cap_flags2:????0x0000

????????????max_vl_num:???? invalid value (0)

????????????bad_pkey_cntr:??????0x0

????????????qkey_viol_cntr:?????0x0

????????????sm_sl:??????????0

????????????pkey_tbl_len:???????1

????????????gid_tbl_len:????????255

????????????subnet_timeout:?????0

????????????init_type_reply:????0

????????????active_width:?????? 4X (2)

????????????active_speed:???????25.0?Gbps (32)

????????????phys_state:???? LINK_UP (5)

????????????GID[??0]:?????? fe80:0000:0000:0000:ba3f:d2ff:fed3:e4c7, RoCE v1

????????????GID[??1]:?????? fe80::ba3f:d2ff:fed3:e4c7, RoCE v2

更多参考:

QEMU官网?Download QEMU - QEMU

文章来源:https://blog.csdn.net/redhat7890/article/details/135829340
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。