Re: [tsc-private] [csit-dev] TRex - replacing CVL with MLX for 100GbE
Maciek Konstantynowicz (mkonstan)
Hi Dave,
Per my action from the last week TSC meeting (item 4.c. in [1]), here is
the list of HW that FD.io project needs and that we can order any
time:
1. 28 NICs, 2p100 GbE from Nvidia / Mellanox - preferred:
MCX613106A-VDAT, less preferred: MCX556A-EDAT, to cover the following
testbeds:
a. Performance 3-Node-ICX, 2 testbeds, 4 SUTs, 2 TGs
b. Performance 2-Node-ICX, 4 testbeds, 4 SUTs, 4 TGs
c. ICX TGs for other systems, 3 TGs
d. 3-Node-Alt (Ampere Altra Arm N1), 1 testbed, 2 SUTs, 1 TG
e. (exact breakdown in my email from 28 Jan 2022 in the thread below)
2. If we also want to add a MLX NIC for functional vpp_device test, that
would be additional 2 MLX 2p100GbE NICs.
Things that we originally planned, but can't place orders as the HW is
not available yet:
3. TBC number of 2-socket Xeon SapphireRapids servers
a. Intel Xeon processor SKUs are not available yet to us - expecting
update any week now.
b. Related SuperMicro SKUs are not available yet to us - expecting
update any week now.
Hope this helps. Happy to answer any questions.
Cheers,
-Maciek
On 5 Apr 2022, at 12:55, Maciek Konstantynowicz (mkonstan) <mkonstan@...> wrote:
Super, thanks!
On 4 Apr 2022, at 20:22, Dave Wallace <dwallacelf@...> wrote:
Hi Maciek,
I have added this information to the TSC Agenda [0].
Thanks,
-daw-
[0] https://wiki.fd.io/view/TSC#Agenda
On 4/4/2022 10:46 AM, Maciek Konstantynowicz (mkonstan) wrote:
Begin forwarded message:
From: mkonstan <mkonstan@...>
Subject: Re: [tsc-private] [csit-dev] TRex - replacing CVL with MLX for 100GbE
Date: 3 March 2022 at 16:23:08 GMT
To: Ed Warnicke <eaw@...>
Cc: "tsc-private@..." <tsc-private@...>, Lijian Zhang <Lijian.Zhang@...>
+LijianHi,
Resending email from January, so it’s refreshed in our collective memory, as discussed on TSC call just now.
Number of 2p100GE MLX NICs needed for performance testing of Ampere Altra servers are listed under point 4 below.
Let me know if anything unclear and/or if any questions.
Cheers,Maciek
On 28 Jan 2022, at 17:35, Maciek Konstantynowicz (mkonstan) via lists.fd.io <mkonstan=cisco.com@...> wrote:
Hi Ed, Trishan,
One correction regarding my last email from 25-Jan:-
For Intel Xeon Icelake testbeds, apart from just replacing E810s on TRex servers, we should also considder adding MLX 100GbE NICs for SUTs, so that FD.io could benchmark MLX on latest Intel Xeon CPUs. Exactly as discussed on in our side conversation, Ed.
Here an updated calc with breakdown for Icelake (ICX) builds (the Cascadelake part stays as per previous email):
// Sorry to TL;DR, if you just want the number of NICs, scroll to the bottom of this message :)
(SUT, system under test, server running VPP+NICs under test)(TG, traffic generator, server running TRex, needs link speeds matching SUTs')
1. 3-Node-ICX, 2 testbeds, 4 SUTs, 2 TGs- 4 SUT/VPP/dpdk servers- 4 ConnectX NIC, 1 per SUT - test ConnectX on SUT- 2 TG/TRex servers- 2 ConnectX NICs, 1 per TG - replace E810s and test E810 on SUT- 2 ConnectX NICs, 1 per TG - test ConnectX on SUT- 1 ConnectX NIC, 1 per testbed type - for TRex calibration- sub-total 9 NICs
2. 2-Node-ICX, 4 testbeds, 4 SUTs, 4 TGs- 4 SUT/VPP/dpdk servers- 4 ConnectX NIC, 1 per SUT - test ConnectX on SUT- 4 TG/TRex servers- 4 ConnectX NICs, 1 per TG - replace E810s and test E810 on SUT- 4 ConnectX NICs, 1 per TG - test ConnectX on SUT- 1 ConnectX NIC, 1 per testbed type - for TRex calibration- sub-total 13 NICs
3. ICX TGs for other systems, 3 TGs- 3 TG/TRex servers- 3 ConnectX NICs, 1 per TG - replace E810s and test ConnectX and other 100GbE NICs on SUTs- 1 ConnectX NIC, 1 per testbed type - for TRex calibration- sub-total 4 NICs
4. 3-Node-Alt (Ampere Altra Arm N1), 1 testbed, 2 SUTs, 1 TG- 2 SUT/VPP/dpdk servers- 2 ConnectX NIC, 1 per SUT - test ConnectX on SUT- 1 TG/TRex server- will use one of the ICX TGs as listed in point 3.- sub-total 2 NICs
Total 28 NICs.
Hope this makes sense ...
Cheers,Maciek
P.S. I'm on PTO now until 7-Feb, so email responses delayed.
On 25 Jan 2022, at 16:38, mkonstan <mkonstan@...> wrote:
Hi Ed, Trishan,
Following from the last TSC call, here are the details about Nvidia Mellanox NICs that we are after for CSIT.
For existing Intel Xeon Cascadelake testbeds we have one option:
- MCX556A-EDAT NIC 2p100GbE - $1,195.00 - details in [2].- need 4 NICs, plus 1 spare => 5 NICs
For the new Intel Xeon Icelake testbeds we have two options:
- MCX556A-EDAT NIC 2p100GbE - $1,195.00 - same as above, OR- MCX613106A-VDAT 2p100GbE - $1,795.00 - details in [3] (limited availability)- need 7 NICs, plus 1 spare => 8 NICs.
We need Nvidia Mellanox advice and assistance with two things:
1. What NIC model we should get for Icelake with PCIe Gen4 x16 slots?2. How many of listed NIC quantities can Nvidia Mellanox donate, vs LFN FD.io project purchasing them thru retail channel?
Let me know if you’re still good to help here and next steps.
Hope this makes sense, let me know if questions.
Cheers,Maciek
- Or MCX613106A-VDAT 2p100GbE (accepts QSFP28 NRZ per [5])Begin forwarded message:
From: "Maciek Konstantynowicz (mkonstan) via lists.fd.io" <mkonstan=cisco.com@...>
Subject: [csit-dev] TRex - replacing CVL with MLX for 100GbE
Date: 23 January 2022 at 20:03:59 GMT
To: csit-dev <csit-dev@...>
Reply-To: mkonstan@...
Hi,
Following discussion on CSIT call last Wednesday [1], we would like to move
forward with using only Mellanox NICs to drive 100 GbE links and
disconnectiong (or removing) E810 CVL NICs from TG(TRex) servers.
This is due to a number of show-stopper issues preventing CSIT use of TRex
with DPDK ICE driver[ICE] and no line of sight to have them addressed.
This impacts our production 2n-clx testbeds, as well as the new icx testbeds
that are being built.
For 2n-clx, I believe we agreed on a call to use the same NIC model that is
already there, MCX556A-EDAT (ConnectX-5 Ex) 2p100GbE [2]. Just add more NICs
for 100GbE capacity.
For icx testbeds, with servers supporting PCIe Gen4, we could also use
MCX556A-EDAT (it supports PCIe Gen4), or take it up a notch and use
ConnectX-6 MCX613106A-VDAT[3] that is advertised with "Up to 215 million
messages/sec", which may mean "215 Mpps". If anybody has experience (or knows
someone who does) with ConnectX-6 with DPDK driver, it would be great to
hear.
Anyways, here a quick calculation about how many NICs we would need:
1. 2n-clx, 3 testbeds, 3 TG servers => 4 NICs
- s34-t27-tg1, s36-t28-tg1, s38-t29-tg1, see [4]
- NIC model: MCX556A-EDAT NIC 2p100GbE
- 1 NIC per TG server => 3 NICs
- 1 NIC per TG server type for calibration => 1 NIC
2. 2n-icx, 4 testbeds, 4 TG servers => 5 NICs
- NIC model: MCX556A-EDAT 2p100GbE QSFP48
- Or MCX613106A-VDAT 2p100GbE (accepts QSFP28 NRZ per [5])
- 1 NIC per TG server => 4 NICs
- 1 NIC per TG server type for calibration => 1 NIC
3. 3n-icx, 2 testbeds, 2 TG servers => 2 NICs
- NIC model: MCX556A-EDAT NIC 2p100GbE
- 1 NIC per TG server => 2 NICs
Thoughts?
Cheers,
Maciek
[1] https://ircbot.wl.linuxfoundation.org/meetings/fdio-meeting/2022/fd_io_tsc/fdio-meeting-fd_io_tsc.2022-01-13-16.00.log.html#l-79
[2] https://store.nvidia.com/en-us/networking/store/product/MCX556A-EDAT/nvidiamcx556a-edatconnectx-5exvpiadaptercardedr100gbe/
[3] https://store.nvidia.com/en-us/networking/store/product/MCX613106A-VDAT/nvidiamcx613106a-vdatconnectx-6enadaptercard200gbe/
[4] https://git.fd.io/csit/tree/docs/lab/testbed_specifications.md#n1189
[5] https://community.mellanox.com/s/question/0D51T00008Cdv1g/qsfp56-ports-accept-qsfp28-devices
[ICE] Summary of issues with DPDK ICE driver support for TRex:
1. TO-DO. Drop All. CVL rte-flow doesn’t support match criteria of ANY.
- TRex: HW assist for rx packet counters; SW mode not fit for purpose.
- Status: POC attempted, incomplete.
2. TO-DO. Steer all to a queue. CVL rte-flow doesn’t support matching criteria of ANY.
- TRex: HW assist for TODO; needed for STL; SW mode not fit for purpose.
- Status: POC attempted, incomplete.
3. TO-VERIFY. CVL doesn’t support ipv4.id.
- TRex: HW assist for flow stats and latency streams redirect.
- Status: Completed in DPDK 21.08.
4. TO-VERIFY. CVL PF doesn’t support LL (Low Latency)/HP (High Priority) for PF queues.
- TRex: Needed for ASTF (stateful).
- Status: CVL (E810) NIC does not have this API but has the capability.
--------------------------------------------------------------------------------