VPP ip4-input drops packets due to "ip4 length > l2 length" errors when using rdma with Mellanox mlx5 cards

Elias Rudberg

Hello VPP developers,

We have a problem with VPP used for NAT on Ubuntu 18.04 servers
equipped with Mellanox ConnectX-5 network cards (ConnectX-5 EN network
interface card; 100GbE dual-port QSFP28; PCIe3.0 x16; tall bracket;

VPP is dropping packets in the ip4-input node due to "ip4 length > l2
length" errors, when we use the RDMA plugin.

The interfaces are configured like this:

create int rdma host-if enp101s0f1 name Interface101 num-rx-queues 1
create int rdma host-if enp179s0f1 name Interface179 num-rx-queues 1

(we have set num-rx-queues 1 now to simplify while troubleshooting, in
production we use num-rx-queues 4)

We see some packets dropped due to "ip4 length > l2 length" for example
in TCP tests with around 100 Mbit/s -- running such a test for a few
seconds already gives some errors. More traffic gives more errors and
it seems to be unrelated to the contents of the packets, it seems to
happen quite randomly and already at such moderate amounts of traffic,
very far below what should be the capacity of the hardware.

Only a small fraction of packets are dropped: in tests at 100 Mbit/s
and packet size 500, for each million packets about 3 or 4 packets get
the "ip4 length > l2 length" drop problem. However, the effect appears
stronger for larger amounts of traffic and has impacted some of our end
users who observe decresed TCP speed as a result of these drops.

The "ip4 length > l2 length" errors can be seen using vppctl "show

142 ip4-input ip4 length > l2 length

To get more info about the "ip4 length > l2 length" error we printed
the involved sizes when the error happens (ip_len0 and cur_len0 in
src/vnet/ip/ip4_input.h), which shows that the actual packet size is
often much smaller than the ip_len0 value which is what the IP packet
size should be according to the IP header. For example, when
ip_len0=500 as is the case for many of our packets in the test runs,
the cur_len0 value is sometimes much smaller. The smallest case we have
seen was cur_len0 = 59 with ip_len0 = 500 -- the IP header said the IP
packet size was 500 bytes, but the actual size was only 59 bytes. So it
seems some data is lost, packets have been truncated, sometimes large
parts of the packets are missing.

The problems disappear if we skip using the RDMA plugin and use the
(old?) dpdk way of handling the interfaces, then there are no "ip4
length > l2 length" drops at all. That makes us think there is
something wrong with the rdma plugin, perhaps a bug or something wrong
with how it is configured.

We have tested this with both the current master branch and the
stable/1908 branch, we see the same problem for both.

We tried updating the Mellanox driver from v4.6 to v4.7 (latest
version) but that did not help.

After trying some different values of the rx-queue-size parameter to
the "create int rdma" command, it seems like the "ip4 length > l2
length" becomes smaller as the rx-queue-size is increased, perhaps
indicating the problem has to do with what happens when the end of that
queue is reached.

Do you agree that the above points to a problem with the RDMA plugin in

Are there known bugs or other issues that could explain the "ip4 length
l2 length" drops?
Does it seem like a good idea to set a very large value of the rx-
queue-size parameter if that alleviates the "ip4 length > l2 length"
problem, or are there big downsides of using a large rx-queue-size

What else could we do to troubleshoot this further, are there
configuration options to the RDMA plugin that could be used to solve
this and/or get more information about what is happening?

Best regards,

Join vpp-dev@lists.fd.io to automatically receive all group messages.