Date   

L2 Bridge question on forwarding

sunny cupertino
 

Hi All,
 
I request your help on L2 Bridge in VPP. I wanted to know if we can selectively forward L2 packets 
to different interfaces on a bridge based on the ethernet address. 
For e.g there is one interface and 2 GTPU tunnels on a L2 bridge. 
I have added an entry into the L2 Fib table telling which interface to use for that
particular ethernet address (ad:ef:ad:ef:de:ad ). 
 
vpp# show l2fib verbose         
    Mac-Address     BD-Idx If-Idx BSN-ISN Age(min) static filter bvi         Interface-Name        
 ad:ef:ad:ef:de:ad    1      4      0/0      no      -      -     -           gtpu_tunnel1         
 01:00:5e:00:00:fb    1      2      0/1      -       -      -     -           gtpu_tunnel0         
 de:ad:be:ef:de:ad    1      2      0/1      -       -      -     -           gtpu_tunnel0         
 28:f1:0e:4e:c2:be    1      3      0/5      -       -      -     -              tap210            
 
but when the packets arrive on tap210 they are sent to both tunnel1 and tunnel0 as well. 
I tried to switch off the flooding, in which case the packets are dropped saying L2 forward
feature is not enabled.  I request you to kindly let me know if this is possible at all to 
achieve. 

Thanks a lot, 

 
Here some information about setup.
 
vpp# show bridge 3 detail       
  BD-ID   Index   BSN  Age(min)  Learning  U-Forwrd   UU-Flood   Flooding  ARP-Term   BVI-Intf 
    3       1      0     off        on        on       flood        on       off        N/A    
 
           Interface           If-idx ISN  SHG  BVI  TxFlood        VLAN-Tag-Rewrite       
            tap210               3     5    0    -      *                 none             
         gtpu_tunnel0            2     1    0    -      *                 none             
         gtpu_tunnel1            4     3    0    -      *                 none             
vpp# 
 
 
trace:
 
Packet 1
 
01:08:35:178099: virtio-input
  virtio: hw_if_index 3 next-index 4 vring 0 len 1516
    hdr: flags 0x00 gso_type 0x00 hdr_len 0 gso_size 0 csum_start 0 csum_offset 0 num_buffers 1
01:08:35:178102: ethernet-input
  IP4: 28:f1:0e:4e:c2:be -> ad:ef:ad:ef:de:ad 802.1q vlan 4 priority 4
01:08:35:178104: l2-input
  l2-input: sw_if_index 3 dst ad:ef:ad:ef:de:ad src 28:f1:0e:4e:c2:be
01:08:35:178106: l2-learn
  l2-learn: sw_if_index 3 dst ad:ef:ad:ef:de:ad src 28:f1:0e:4e:c2:be bd_index 1
01:08:35:178109: l2-flood
  l2-flood: sw_if_index 3 dst ad:ef:ad:ef:de:ad src 28:f1:0e:4e:c2:be bd_index 1
  l2-flood: sw_if_index 3 dst ad:ef:ad:ef:de:ad src 28:f1:0e:4e:c2:be bd_index 1
01:08:35:178111: l2-output
  l2-output: sw_if_index 4 dst ad:ef:ad:ef:de:ad src 28:f1:0e:4e:c2:be data 81 00 80 04 08 00 45 00 05 da 19 bf
  l2-output: sw_if_index 2 dst ad:ef:ad:ef:de:ad src 28:f1:0e:4e:c2:be data 81 00 80 04 08 00 45 00 05 da 19 bf
01:08:35:178113: gtpu4-encap
  GTPU encap to gtpu_tunnel1 teid 7
  GTPU encap to gtpu_tunnel0 teid 3
01:08:35:178115: ip4-load-balance
  fib 4 dpo-idx 1 flow hash: 0x00000005
  UDP: 192.168.50.40 -> 192.168.50.10
    tos 0x00, ttl 254, length 1552, checksum 0xd159
    fragment id 0x0000
  UDP: 11189 -> 2152
    length 1532, checksum 0x0000
  fib 2 dpo-idx 1 flow hash: 0x00000005
  UDP: 192.168.50.40 -> 192.168.50.10
    tos 0x00, ttl 254, length 1552, checksum 0xd159
 
 


VPP ip4-input drops packets due to "ip4 length > l2 length" errors when using rdma with Mellanox mlx5 cards

Elias Rudberg
 

Hello VPP developers,

We have a problem with VPP used for NAT on Ubuntu 18.04 servers
equipped with Mellanox ConnectX-5 network cards (ConnectX-5 EN network
interface card; 100GbE dual-port QSFP28; PCIe3.0 x16; tall bracket;
ROHS R6).

VPP is dropping packets in the ip4-input node due to "ip4 length > l2
length" errors, when we use the RDMA plugin.

The interfaces are configured like this:

create int rdma host-if enp101s0f1 name Interface101 num-rx-queues 1
create int rdma host-if enp179s0f1 name Interface179 num-rx-queues 1

(we have set num-rx-queues 1 now to simplify while troubleshooting, in
production we use num-rx-queues 4)

We see some packets dropped due to "ip4 length > l2 length" for example
in TCP tests with around 100 Mbit/s -- running such a test for a few
seconds already gives some errors. More traffic gives more errors and
it seems to be unrelated to the contents of the packets, it seems to
happen quite randomly and already at such moderate amounts of traffic,
very far below what should be the capacity of the hardware.

Only a small fraction of packets are dropped: in tests at 100 Mbit/s
and packet size 500, for each million packets about 3 or 4 packets get
the "ip4 length > l2 length" drop problem. However, the effect appears
stronger for larger amounts of traffic and has impacted some of our end
users who observe decresed TCP speed as a result of these drops.

The "ip4 length > l2 length" errors can be seen using vppctl "show
errors":

142 ip4-input ip4 length > l2 length

To get more info about the "ip4 length > l2 length" error we printed
the involved sizes when the error happens (ip_len0 and cur_len0 in
src/vnet/ip/ip4_input.h), which shows that the actual packet size is
often much smaller than the ip_len0 value which is what the IP packet
size should be according to the IP header. For example, when
ip_len0=500 as is the case for many of our packets in the test runs,
the cur_len0 value is sometimes much smaller. The smallest case we have
seen was cur_len0 = 59 with ip_len0 = 500 -- the IP header said the IP
packet size was 500 bytes, but the actual size was only 59 bytes. So it
seems some data is lost, packets have been truncated, sometimes large
parts of the packets are missing.

The problems disappear if we skip using the RDMA plugin and use the
(old?) dpdk way of handling the interfaces, then there are no "ip4
length > l2 length" drops at all. That makes us think there is
something wrong with the rdma plugin, perhaps a bug or something wrong
with how it is configured.

We have tested this with both the current master branch and the
stable/1908 branch, we see the same problem for both.

We tried updating the Mellanox driver from v4.6 to v4.7 (latest
version) but that did not help.

After trying some different values of the rx-queue-size parameter to
the "create int rdma" command, it seems like the "ip4 length > l2
length" becomes smaller as the rx-queue-size is increased, perhaps
indicating the problem has to do with what happens when the end of that
queue is reached.

Do you agree that the above points to a problem with the RDMA plugin in
VPP?

Are there known bugs or other issues that could explain the "ip4 length
l2 length" drops?
Does it seem like a good idea to set a very large value of the rx-
queue-size parameter if that alleviates the "ip4 length > l2 length"
problem, or are there big downsides of using a large rx-queue-size
value?

What else could we do to troubleshoot this further, are there
configuration options to the RDMA plugin that could be used to solve
this and/or get more information about what is happening?

Best regards,
Elias


Re: How to receive broadcast messages in VPP?

Neale Ranns
 

Hi Elias,

On 14/02/2020 13:35, "Elias Rudberg" <Elias.Rudberg@...> wrote:

Hi Neale and Dave,

Thanks for your answers!
I was able to make it work using multicast as Neale suggested.

Here is roughly what I did to make it work using multicast instead of
unicast:

On the sending side, to make it send multicast packets:

adj_index_t adj_index_for_multicast = adj_mcast_add_or_lock
(FIB_PROTOCOL_IP4, VNET_LINK_IP4, sw_if_index);

Note the 'and_lock' part here. You need to call adj_unlock too. But not before your packet has been sent...
In other words, it's best to do this once in the control plane and use it in the DP.

and then when a message is to be sent, use the above created adj_index
before invoking ip4_rewrite_node (instead of ip4_lookup_node):

vnet_buffer (b)->ip.adj_index[VLIB_TX] = adj_index_for_multicast;
vlib_put_frame_to_node (vm, ip4_rewrite_node.index, f);

you need the mcast version of the rewrite so the correct mcast mac is added, which is: ip4_rewrite_mcast_node. Or better still use: adj->ia_node_index.

/neale

On the receiving side the following config was needed:

ip mroute add 224.0.0.1 via MyInterface Accept
ip mroute add 224.0.0.1 via local Forward

After that it works using multicast. Thanks for your help!
(Please let me know if the above is not the right way to do it)

Best regards,
Elias

On Thu, 2020-02-06 at 13:45 +0000, Neale Ranns via Lists.Fd.Io wrote:
> Hi Elias,
>
> Please see inline.
>
>
> On 06/02/2020 12:41, "vpp-dev@... on behalf of Elias
> Rudberg" <vpp-dev@... on behalf of elias.rudberg@...>
> wrote:
>
> Hello everyone,
>
> I am trying to figure out how to receive broadcast messages in
> VPP (vpp
> version 19.08 in case that matters).
>
> This is in the context of some changes we are considering in the
> VPP
> NAT HA functionality. That code in e.g. plugins/nat/nat_ha.c uses
> UDP
> messages to communicate information about NAT sessions between
> different VPP servers. It is currently using unicast messages,
> but we
> are considering the possibility of using broadcast messages
> instead,
> hoping that could be more efficient in case there are more than
> two
> servers involved. For example, when a new NAT session has been
> created,
> we could send a broadcast message about the new session, that
> would
> reach several other VPP servers, without need to send a separate
> unicast message to each server.
>
> The code in plugins/nat/nat_ha.c calls udp_register_dst_port() to
> register that it wants to receive UDP traffic, like this:
>
> udp_register_dst_port (ha->vlib_main, port,
> nat_ha_handoff_node.index, 1);
>
> This works fine for unicast messages; when such packets arrive at
> the
> given port, they get handled by the nat_ha_handoff_node as
> desired.
>
> However, if broadcast packets arrive, those packets are dropped
> instead, they do not arrive at the nat_ha_handoff_node.
>
> For example, if the IP address of the relevant interface on the
> receiving side is 10.10.50.1/24 then unicast UDP messages with
> destination 10.10.50.1 are handled fine. However, if the
> destination is
> 10.10.50.255 (the broadcast address for that /24 subnet) then the
> packets are dropped. Here is an example of a packet trace when
> such a
> packet is received from 10.10.50.2:
>
> 02:41:19:250212: rdma-input
> rdma: Interface101 (3) next-node bond-input
> 02:41:19:250214: bond-input
> src 02:fe:ff:76:e4:5d, dst ff:ff:ff:ff:ff:ff, Interface101 ->
> BondEthernet0
> 02:41:19:250214: ethernet-input
> IP4: 02:fe:ff:76:e4:5d -> ff:ff:ff:ff:ff:ff 802.1q vlan 1015
> 02:41:19:250215: ip4-input
> UDP: 10.10.50.2 -> 10.10.50.255
> tos 0x80, ttl 254, length 92, checksum 0x02fa
> fragment id 0x0002, flags DONT_FRAGMENT
> UDP: 1234 -> 2345
> length 72, checksum 0x0000
> 02:41:19:250216: ip4-lookup
> fib 0 dpo-idx 0 flow hash: 0x00000000
> UDP: 10.10.50.2 -> 10.10.50.255
> tos 0x80, ttl 254, length 92, checksum 0x02fa
> fragment id 0x0002, flags DONT_FRAGMENT
> UDP: 1234 -> 2345
> length 72, checksum 0x0000
> 02:41:19:250217: ip4-drop
> UDP: 10.10.50.2 -> 10.10.50.255
> tos 0x80, ttl 254, length 92, checksum 0x02fa
> fragment id 0x0002, flags DONT_FRAGMENT
> UDP: 1234 -> 2345
> length 72, checksum 0x0000
>
> if you check:
> sh ip fib 10.10.50.255/32
> you'll see an explicit entry to drop. You can't override this.
>
>
> 02:41:19:250217: error-drop
> rx:BondEthernet0.1015
> 02:41:19:250217: drop
> ethernet-input: no error
>
> So the packet ends up at ip4-drop when I would have liked it to
> come to
> nat_ha_handoff_node.
>
> Does anyone have a suggestion about how to make this work?
> Is some special configuration of the receiving interface needed
> to tell
> VPP that we want it to receive broadcast packets?
>
> I'd suggest you multicast it instead. Pick an address and have the
> other servers listen.
> See:
> https://wiki.fd.io/view/VPP/MFIB
> for programming multicast.
> Multicast will also work with IPv6 interface addresses.
>
> /neale
>
>


Published: FD.io CSIT-2001 Release Report

Maciek Konstantynowicz (mkonstan)
 

Hi All,

FD.io CSIT-2001 report has been published on FD.io docs site:

https://docs.fd.io/csit/rls2001/report/

Many thanks to All in CSIT, VPP and wider FD.io community who
contributed and worked hard to make CSIT-2001 happen!

Below three summaries:
- Intel Xeon 2n-skx, 3n-skx and 2n-clx Testbeds microcode issue.
- CSIT-2001 Release Summary, a high-level summary.
- Points of Note in CSIT-2001 Report, with specific links to report.

Welcome all comments, best by email to csit-dev@....

Cheers,
-Maciek

------------------------------------------------------------------------
NOTE: Intel Xeon 2n-skx, 3n-skx and 2n-clx Testbeds microcode issue.
------------------------------------------------------------------------
VPP and DPDK performance test data is not included in this report
version. This is due to the lower performance and behaviour
inconsistency of these systems following the upgrade of processor
microcode packages (skx ucode 0x2000064, clx ucode 0x500002c), done
as part of updating Ubuntu 18.04 LTS kernel version. Tested VPP and
DPDK applications (L3fwd) are affected. Skx and Clx test data will
be added in subsequent maintenance report version(s) once the issue
is resolved. See https://jira.fd.io/browse/CSIT-1675.

------------------------------------------------------------------------
CSIT-2001 Release Summary
------------------------------------------------------------------------
1. CSIT-2001 Report

- html link: https://docs.fd.io/csit/rls2001/report/
- pdf link: https://docs.fd.io/csit/rls2001/report/_static/archive/csit_rls2001.pdf

2. New Tests

- NFV density tests with IPsec encryption between DUTs.

- Full test coverage for VPP AVF driver for Fortville NICs.

- VPP Hoststack TCP/IP tests with wrk, iperf3 with LDPreload tests
without and with packet loss via VPP NSIM plugin), and QUIC/UDP/IP
transport tests.

- Mellanox ConnectX5-2p100GE NICs in 2n-clx testbeds using VPP native
rdma driver.

- Load Balancer tests.

3. Benchmarking

- Fully onboarded new Intel Xeon Cascadelake Testbeds with x710,
xxv710 and mcx556a-edat NIC cards.

- Added new High Dynamic Range Histogram latency measurements.

4. Infrastructure

- Full migration of CSIT from Python2.7 to Python3.6.

------------------------------------------------------------------------
Points of Note in the CSIT-2001 Report
------------------------------------------------------------------------
Indexed specific links listed at the bottom.

1. VPP release notes
a. Changes in CSIT-2001: [1]
b. Known issues: [2]

2. VPP performance - 64B/IMIX throughput graphs (selected NIC models):
a. Graphs explained: [3]
b. L2 Ethernet Switching: [4]
c. IPv4 Routing: [5]
d. IPv6 Routing: [6]
e. SRv6 Routing: [7]
f. IPv4 Tunnels: [8]
g. KVM VMs vhost-user: [9]
h. LXC/DRC Container Memif: [10]
e. IPsec IPv4 Routing: [11]
f. Virtual Topology System: [12]

3. VPP performance - multi-core and latency graphs:
a. Speedup Multi-Core: [13]
b. Latency: [14]

4. VPP performance comparisons
a. VPP-20.01 vs. VPP-19.08: [15]

5. VPP performance test details - all NICs:
a. Detailed results 64B IMIX 1518B 9kB: [16]
b. Configuration: [17]

DPDK Testpmd and L3fwd performance sections follow similar structure.

6. DPDK applications:
a. Release notes: [18]
b. DPDK performance - 64B throughput graphs: [19]
c. DPDK performance - latency graphs: [20]
d. DPDK performance - DPDK-19.08 vs. DPDK-19.05: [21]

Functional tests, including VPP_Device (functional device tests),
VPP_VIRL and HoneyComb are all included in the report.

Specific links within the report:

[1] https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/csit_release_notes.html#changes-in-csit-release
[2] https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/csit_release_notes.html#known-issues
[3] https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/packet_throughput_graphs/index.html
[4] https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/packet_throughput_graphs/l2.html
[5] https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/packet_throughput_graphs/ip4.html
[6] https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/packet_throughput_graphs/ip6.html
[7] https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/packet_throughput_graphs/srv6.html
[8] https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/packet_throughput_graphs/ip4_tunnels.html
[9] https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/packet_throughput_graphs/vm_vhost.html
[10] https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/packet_throughput_graphs/container_memif.html
[11] https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/packet_throughput_graphs/ipsec.html
[12] https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/packet_throughput_graphs/vts.html
[13] https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/throughput_speedup_multi_core/index.html
[14] https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/packet_latency/index.html
[15] https://docs.fd.io/csit/rls2001/report/vpp_performance_tests/comparisons/current_vs_previous_release.html
[16] https://docs.fd.io/csit/rls2001/report/detailed_test_results/vpp_performance_results/index.html
[17] https://docs.fd.io/csit/rls2001/report/test_configuration/vpp_performance_configuration/index.html
[18] https://docs.fd.io/csit/rls2001/report/dpdk_performance_tests/csit_release_notes.html
[19] https://docs.fd.io/csit/rls2001/report/dpdk_performance_tests/packet_throughput_graphs/index.html
[20] https://docs.fd.io/csit/rls2001/report/dpdk_performance_tests/packet_latency/index.html
[21] https://docs.fd.io/csit/rls2001/report/dpdk_performance_tests/comparisons/current_vs_previous_release.html
------------------------------------------------------------------------


Coverity run FAILED as of 2020-02-14 14:00:24 UTC

Noreply Jenkins
 

Coverity run failed today.

Current number of outstanding issues are 1
Newly detected: 0
Eliminated: 1
More details can be found at https://scan.coverity.com/projects/fd-io-vpp/view_defects


Re: VPP crashes on CSIT Taishan server

Damjan Marion
 


Makes sense to me, please submit patch to gerrit…

— 
Damjan

On 14 Feb 2020, at 04:18, Lijian Zhang <Lijian.Zhang@...> wrote:

Hi,
VPP crashes on CSIT Taishan server due to function vlib_get_thread_core_numa (unsigned cpu_id) not getting NUMA node correctly via cpu_id on Taishan server.
vlib_get_thread_core_numa () is using physical_package_id as NUMA node.
 
However, Taishan server has 2 physical sockets, but 4 NUMA nodes as below output of lscpu. And the sysfs shows the physical_package_id is not sequential on Taishan server.
taishan-d05-08:~$ cat /sys/devices/system/cpu/cpu37/topology/physical_package_id
3002
taishan-d05-08:~$ cat /sys/devices/system/cpu/cpu3/topology/physical_package_id
36
taishan-d05-08:~$ cat /sys/devices/system/cpu/cpu15/topology/physical_package_id
36
 
How about using below sysfs to get NUA node via cpu_id? The code change is also attached in the end.
 
$ cat /sys/devices/system/node/online
0-3
$ cat /sys/devices/system/node/node0/cpulist
0-15
$ cat /sys/devices/system/node/node1/cpulist
16-31
$ cat /sys/devices/system/node/node2/cpulist
32-47
$ cat /sys/devices/system/node/node3/cpulist
48-63
 
taishan-d05-08:~$ lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              64
On-line CPU(s) list: 0-63
Thread(s) per core:  1
Core(s) per socket:  32
Socket(s):           2
NUMA node(s):        4
Vendor ID:           ARM
Model:               2
Model name:          Cortex-A72
Stepping:            r0p2
BogoMIPS:            100.00
L1d cache:           32K
L1i cache:           48K
L2 cache:            1024K
L3 cache:            16384K
NUMA node0 CPU(s):   0-15
NUMA node1 CPU(s):   16-31
NUMA node2 CPU(s):   32-47
NUMA node3 CPU(s):   48-63
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
 
 
diff --git a/src/vlib/threads.c b/src/vlib/threads.c
index 1ce4dc156..3f0905421 100644
--- a/src/vlib/threads.c
+++ b/src/vlib/threads.c
@@ -598,15 +598,30 @@ void
vlib_get_thread_core_numa (vlib_worker_thread_t * w, unsigned cpu_id)
{
   const char *sys_cpu_path = "/sys/devices/system/cpu/cpu";
+  const char *sys_node_path = "/sys/devices/system/node/node";
+  clib_bitmap_t *nbmp = 0, *cbmp = 0;
+  u32 node;
   u8 *p = 0;
   int core_id = -1, numa_id = -1;
 
   p = format (p, "%s%u/topology/core_id%c", sys_cpu_path, cpu_id, 0);
   clib_sysfs_read ((char *) p, "%d", &core_id);
   vec_reset_length (p);
-  p = format (p, "%s%u/topology/physical_package_id%c", sys_cpu_path,
-             cpu_id, 0);
-  clib_sysfs_read ((char *) p, "%d", &numa_id);
+
+  /* *INDENT-OFF* */
+  clib_sysfs_read ("/sys/devices/system/node/online", "%U",
+        unformat_bitmap_list, &nbmp);
+  clib_bitmap_foreach (node, nbmp, ({
+    p = format (p, "%s%u/cpulist%c", sys_node_path, node, 0);
+    clib_sysfs_read ((char *) p, "%U", unformat_bitmap_list, &cbmp);
+    if (clib_bitmap_get (cbmp, cpu_id))
+      numa_id = node;
+    vec_reset_length (cbmp);
+    vec_reset_length (p);
+  }));
+  /* *INDENT-ON* */
+  vec_free (nbmp);
+  vec_free (cbmp);
   vec_free (p);
 
   w->core_id = core_id;
 
Thanks.


Re: How to receive broadcast messages in VPP?

Elias Rudberg
 

Hi Neale and Dave,

Thanks for your answers!
I was able to make it work using multicast as Neale suggested.

Here is roughly what I did to make it work using multicast instead of
unicast:

On the sending side, to make it send multicast packets:

adj_index_t adj_index_for_multicast = adj_mcast_add_or_lock
(FIB_PROTOCOL_IP4, VNET_LINK_IP4, sw_if_index);

and then when a message is to be sent, use the above created adj_index
before invoking ip4_rewrite_node (instead of ip4_lookup_node):

vnet_buffer (b)->ip.adj_index[VLIB_TX] = adj_index_for_multicast;
vlib_put_frame_to_node (vm, ip4_rewrite_node.index, f);

On the receiving side the following config was needed:

ip mroute add 224.0.0.1 via MyInterface Accept
ip mroute add 224.0.0.1 via local Forward

After that it works using multicast. Thanks for your help!
(Please let me know if the above is not the right way to do it)

Best regards,
Elias

On Thu, 2020-02-06 at 13:45 +0000, Neale Ranns via Lists.Fd.Io wrote:
Hi Elias,

Please see inline.


On 06/02/2020 12:41, "vpp-dev@... on behalf of Elias
Rudberg" <vpp-dev@... on behalf of elias.rudberg@...>
wrote:

Hello everyone,

I am trying to figure out how to receive broadcast messages in
VPP (vpp
version 19.08 in case that matters).

This is in the context of some changes we are considering in the
VPP
NAT HA functionality. That code in e.g. plugins/nat/nat_ha.c uses
UDP
messages to communicate information about NAT sessions between
different VPP servers. It is currently using unicast messages,
but we
are considering the possibility of using broadcast messages
instead,
hoping that could be more efficient in case there are more than
two
servers involved. For example, when a new NAT session has been
created,
we could send a broadcast message about the new session, that
would
reach several other VPP servers, without need to send a separate
unicast message to each server.

The code in plugins/nat/nat_ha.c calls udp_register_dst_port() to
register that it wants to receive UDP traffic, like this:

udp_register_dst_port (ha->vlib_main, port,
nat_ha_handoff_node.index, 1);

This works fine for unicast messages; when such packets arrive at
the
given port, they get handled by the nat_ha_handoff_node as
desired.

However, if broadcast packets arrive, those packets are dropped
instead, they do not arrive at the nat_ha_handoff_node.

For example, if the IP address of the relevant interface on the
receiving side is 10.10.50.1/24 then unicast UDP messages with
destination 10.10.50.1 are handled fine. However, if the
destination is
10.10.50.255 (the broadcast address for that /24 subnet) then the
packets are dropped. Here is an example of a packet trace when
such a
packet is received from 10.10.50.2:

02:41:19:250212: rdma-input
rdma: Interface101 (3) next-node bond-input
02:41:19:250214: bond-input
src 02:fe:ff:76:e4:5d, dst ff:ff:ff:ff:ff:ff, Interface101 ->
BondEthernet0
02:41:19:250214: ethernet-input
IP4: 02:fe:ff:76:e4:5d -> ff:ff:ff:ff:ff:ff 802.1q vlan 1015
02:41:19:250215: ip4-input
UDP: 10.10.50.2 -> 10.10.50.255
tos 0x80, ttl 254, length 92, checksum 0x02fa
fragment id 0x0002, flags DONT_FRAGMENT
UDP: 1234 -> 2345
length 72, checksum 0x0000
02:41:19:250216: ip4-lookup
fib 0 dpo-idx 0 flow hash: 0x00000000
UDP: 10.10.50.2 -> 10.10.50.255
tos 0x80, ttl 254, length 92, checksum 0x02fa
fragment id 0x0002, flags DONT_FRAGMENT
UDP: 1234 -> 2345
length 72, checksum 0x0000
02:41:19:250217: ip4-drop
UDP: 10.10.50.2 -> 10.10.50.255
tos 0x80, ttl 254, length 92, checksum 0x02fa
fragment id 0x0002, flags DONT_FRAGMENT
UDP: 1234 -> 2345
length 72, checksum 0x0000

if you check:
sh ip fib 10.10.50.255/32
you'll see an explicit entry to drop. You can't override this.


02:41:19:250217: error-drop
rx:BondEthernet0.1015
02:41:19:250217: drop
ethernet-input: no error

So the packet ends up at ip4-drop when I would have liked it to
come to
nat_ha_handoff_node.

Does anyone have a suggestion about how to make this work?
Is some special configuration of the receiving interface needed
to tell
VPP that we want it to receive broadcast packets?

I'd suggest you multicast it instead. Pick an address and have the
other servers listen.
See:
https://wiki.fd.io/view/VPP/MFIB
for programming multicast.
Multicast will also work with IPv6 interface addresses.

/neale


Re: VLIB headroom buffer size modfification

Damjan Marion
 

you need to set it on both sides:

For VPP:

$ ccmake build-root/build-vpp-native/vpp
and change PRE_DATA_SIZE to 256

or modify following line:

src/vlib/CMakeLists.txt:

set(PRE_DATA_SIZE 128 CACHE STRING "Buffer headroom size.”)

For DPDK you should be able to build custom ext deps package:

$ sudo dpkg -r vpp-ext-deps
$ make install-ext-deps DPDK_PKTMBUF_HEADROOM=256

On 14 Feb 2020, at 11:44, Mohamed feroz Abdul majeeth <ferozvdm@...> wrote:

Hi folks,

In FD.io vpp the default VLIB_BUFFER_PRE_DATA_SIZE data header room size is defined as 128.
And in dpdk also it is defined as 128 as we have encapsulation which goes beyond 128 bytes
the packet descriptor block is getting corrupted in the structure vlib_buffer_t as defined in vlib_buffer.h

how to increase the headroom buffer size to 256 bytes for both dpdk and vpp?

Thanks,
Feroz


VLIB headroom buffer size modfification

Mohamed feroz Abdul majeeth
 

Hi folks,

In FD.io vpp the default VLIB_BUFFER_PRE_DATA_SIZE data header room size is defined as 128.
And in dpdk also it is defined as 128 as we have encapsulation which goes beyond 128 bytes
the packet descriptor block is getting corrupted in the structure vlib_buffer_t as defined in vlib_buffer.h
 
how to increase the headroom buffer size to 256 bytes for both dpdk and vpp?
 
Thanks,
Feroz


VPP crashes on CSIT Taishan server

Lijian Zhang
 

Hi,

VPP crashes on CSIT Taishan server due to function vlib_get_thread_core_numa (unsigned cpu_id) not getting NUMA node correctly via cpu_id on Taishan server.

vlib_get_thread_core_numa () is using physical_package_id as NUMA node.

 

However, Taishan server has 2 physical sockets, but 4 NUMA nodes as below output of lscpu. And the sysfs shows the physical_package_id is not sequential on Taishan server.

taishan-d05-08:~$ cat /sys/devices/system/cpu/cpu37/topology/physical_package_id

3002

taishan-d05-08:~$ cat /sys/devices/system/cpu/cpu3/topology/physical_package_id

36

taishan-d05-08:~$ cat /sys/devices/system/cpu/cpu15/topology/physical_package_id

36

 

How about using below sysfs to get NUA node via cpu_id? The code change is also attached in the end.

 

$ cat /sys/devices/system/node/online

0-3

$ cat /sys/devices/system/node/node0/cpulist

0-15

$ cat /sys/devices/system/node/node1/cpulist

16-31

$ cat /sys/devices/system/node/node2/cpulist

32-47

$ cat /sys/devices/system/node/node3/cpulist

48-63

 

taishan-d05-08:~$ lscpu

Architecture:        aarch64

Byte Order:          Little Endian

CPU(s):              64

On-line CPU(s) list: 0-63

Thread(s) per core:  1

Core(s) per socket:  32

Socket(s):           2

NUMA node(s):        4

Vendor ID:           ARM

Model:               2

Model name:          Cortex-A72

Stepping:            r0p2

BogoMIPS:            100.00

L1d cache:           32K

L1i cache:           48K

L2 cache:            1024K

L3 cache:            16384K

NUMA node0 CPU(s):   0-15

NUMA node1 CPU(s):   16-31

NUMA node2 CPU(s):   32-47

NUMA node3 CPU(s):   48-63

Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid

 

 

diff --git a/src/vlib/threads.c b/src/vlib/threads.c

index 1ce4dc156..3f0905421 100644

--- a/src/vlib/threads.c

+++ b/src/vlib/threads.c

@@ -598,15 +598,30 @@ void

vlib_get_thread_core_numa (vlib_worker_thread_t * w, unsigned cpu_id)

{

   const char *sys_cpu_path = "/sys/devices/system/cpu/cpu";

+  const char *sys_node_path = "/sys/devices/system/node/node";

+  clib_bitmap_t *nbmp = 0, *cbmp = 0;

+  u32 node;

   u8 *p = 0;

   int core_id = -1, numa_id = -1;

 

   p = format (p, "%s%u/topology/core_id%c", sys_cpu_path, cpu_id, 0);

   clib_sysfs_read ((char *) p, "%d", &core_id);

   vec_reset_length (p);

-  p = format (p, "%s%u/topology/physical_package_id%c", sys_cpu_path,

-             cpu_id, 0);

-  clib_sysfs_read ((char *) p, "%d", &numa_id);

+

+  /* *INDENT-OFF* */

+  clib_sysfs_read ("/sys/devices/system/node/online", "%U",

+        unformat_bitmap_list, &nbmp);

+  clib_bitmap_foreach (node, nbmp, ({

+    p = format (p, "%s%u/cpulist%c", sys_node_path, node, 0);

+    clib_sysfs_read ((char *) p, "%U", unformat_bitmap_list, &cbmp);

+    if (clib_bitmap_get (cbmp, cpu_id))

+      numa_id = node;

+    vec_reset_length (cbmp);

+    vec_reset_length (p);

+  }));

+  /* *INDENT-ON* */

+  vec_free (nbmp);

+  vec_free (cbmp);

   vec_free (p);

 

   w->core_id = core_id;

 

Thanks.


Re: Based on the VPP to Nginx testing #ngnix #vpp

Florin Coras
 

Hi Amit, 

Here’s a minimal example [1] that’s based on some of the scripts I’m using. Note that I haven’t tested this in isolation, so do let me know if you hit any issues. 

Regards, 
Florin

On Feb 12, 2020, at 11:43 PM, Amit Mehra <amito2in@...> wrote:

Hi,

I am also curious to know how to run nginx with VPP using LD_PRELOAD option. I have installed nginx and able to run it successfully without VPP. Now, i want to try nginx with vpp using LD_PRELOAD option, can someone provide me the steps for the same?

Regards,
Amit

On Fri, Dec 27, 2019 at 11:57 AM Florin Coras <fcoras.lists@...> wrote:
Hi Yang.L,

I suspect you may need to do a “git pull” and rebuild because the lines don’t match, i.e., vcl_session_accepted_handler:377 is now just an assignment. Let me know if that solves the issue.

Regards,
Florin

> On Dec 26, 2019, at 10:11 PM, lin.yang13@... wrote:
>
> Hi Florin,
> I have tried the latest master.The problem is not resolved.
> Here's the nginx error linformation:
>
> epoll_ctl:2203: ldp<269924>: epfd 33 ep_vlsh 1, fd 61 vlsh 29, op 1
> ldp_accept4:2043: ldp<269924>: listen fd 32: calling vppcom_session_accept: listen sid 0, ep 0x0, flags 0xffffdc50
> vppcom_session_accept:1521: vcl<269924:1>: listener 16777216 [0x0] accepted 30 [0x1e] peer: 192.168.3.66:47672 local: 192.168.3.65:8080
> epoll_ctl:2203: ldp<269924>: epfd 33 ep_vlsh 1, fd 62 vlsh 30, op 1
> ldp_accept4:2043: ldp<269924>: listen fd 32: calling vppcom_session_accept: listen sid 0, ep 0x0, flags 0xffffdc50
> vppcom_session_accept:1521: vcl<269924:1>: listener 16777216 [0x0] accepted 31 [0x1f] peer: 192.168.3.66:47674 local: 192.168.3.65:8080
> epoll_ctl:2203: ldp<269924>: epfd 33 ep_vlsh 1, fd 63 vlsh 31, op 1
> ldp_accept4:2043: ldp<269924>: listen fd 32: calling vppcom_session_accept: listen sid 0, ep 0x0, flags 0xffffdc50
> vcl_session_accepted_handler:377: vcl<269924:1>: ERROR: segment for session 32 couldn't be mounted!
> 2019/12/28 11:06:44 [error] 269924#0: accept4() failed (103: Software caused connection abort)
> ldp_accept4:2043: ldp<269924>: listen fd 32: calling vppcom_session_accept: listen sid 0, ep 0x0, flags 0xffffdc50
> vcl_session_accepted_handler:377: vcl<269924:1>: ERROR: segment for session 32 couldn't be mounted!
> 2019/12/28 11:06:44 [error] 269924#0: accept4() failed (103: Software caused connection abort)
> ldp_accept4:2043: ldp<269924>: listen fd 32: calling vppcom_session_accept: listen sid 0, ep 0x0, flags 0xffffdc50
> vcl_session_accepted_handler:377: vcl<269924:1>: ERROR: segment for session 32 couldn't be mounted!
> 2019/12/28 11:06:44 [error] 269924#0: accept4() failed (103: Software caused connection abort)
> ldp_accept4:2043: ldp<269924>: listen fd 32: calling vppcom_session_accept: listen sid 0, ep 0x0, flags 0xffffdc50
> vcl_session_accepted_handler:377: vcl<269924:1>: ERROR: segment for session 32 couldn't be mounted!
> 2019/12/28 11:06:44 [error] 269924#0: accept4() failed (103: Software caused connection abort)
> ldp_accept4:2043: ldp<269924>: listen fd 32: calling vppcom_session_accept: listen sid 0, ep 0x0, flags 0xffffdc50
> vcl_session_accepted_handler:377: vcl<269924:1>: ERROR: segment for session 32 couldn't be mounted!
> 2019/12/28 11:06:44 [error] 269924#0: accept4() failed (103: Software caused connection abort)
>
> Can you help me analyze it?
> Thanks,
> Yang.L

>


VPP packet capture via DPDK #vpp #dpdk #span #pcap

Chris King
 

Hello,

I currently have VPP 19.08.1 running on Azure.

I would like to be able to specify a virtual device - in this case, net_pcap0, when starting VPP. I would then like to be able to configure a VPP SPAN port to mirror the traffic from one or more interfaces to my PCAP virtual device. Under the hood, my hope is that this will allow DPDK to be able to capture the traffic coming in on the PCAP virtual device and write the captured traffic to a PCAP file on disk. I am aware there is native PCAP functionality in VPP, but I'm wondering if it's possible to rely on DPDK for this under the hood. Please let me know if this is not recommended and/or not currently possible for any reason.

I have 2 VPP-owned interfaces both using the vdev_netvsc PMD (aka failsafe) drivers; namely, FailsafeEthernet2 and FailsafeEthernet4.

Here are the 2 main problems I've encountered so far:
1) VPP 19.08.1 uses DPDK 19.05 and it seems that DPDK 19.05 has issues starting devices using the net_pcap driver (even when compiled with the CONFIG_RTE_LIBRTE_PMD_PCAP option set to 'y'). When I try to start DPDK's testpmd application using a --vdev=net_pcap0,tx_pcap=/tmp/cap.pcap parameter I always get the following error:
Fail: nb_rxq(1) is greater than max_rx_queues(0)
EAL: Error - exiting with code: 1
  Cause: FAIL from init_fwd_streams()

This issue goes away when running DPDK 19.11 so I think it has been resolved at the DPDK level. Does a more recent version of VPP rely on either DPDK 19.08 or DPDK 19.11?

2) The other problem I am experiencing is the inability to specify non-PCI (virtual, I guess) devices in VPP's DPDK config section. VPP seems to expect that each device refers to a PCI device. For example "dev 0002:00:02.0". I don't know I would create a device using the PCAP PMD (net_pcap0) driver that VPP could refer to and use in the VPP/DPDK config.

Does anyone have ideas on how I could use VPP, but capture packets at the DPDK layer (on Azure)?

Thanks,

Chris


Coverity run FAILED as of 2020-02-13 14:00:23 UTC

Noreply Jenkins
 

Coverity run failed today.

Current number of outstanding issues are 1
Newly detected: 0
Eliminated: 1
More details can be found at https://scan.coverity.com/projects/fd-io-vpp/view_defects


Re: Based on the VPP to Nginx testing #ngnix #vpp

Amit Mehra
 

Hi,

I am also curious to know how to run nginx with VPP using LD_PRELOAD option. I have installed nginx and able to run it successfully without VPP. Now, i want to try nginx with vpp using LD_PRELOAD option, can someone provide me the steps for the same?

Regards,
Amit

On Fri, Dec 27, 2019 at 11:57 AM Florin Coras <fcoras.lists@...> wrote:
Hi Yang.L,

I suspect you may need to do a “git pull” and rebuild because the lines don’t match, i.e., vcl_session_accepted_handler:377 is now just an assignment. Let me know if that solves the issue.

Regards,
Florin

> On Dec 26, 2019, at 10:11 PM, lin.yang13@... wrote:
>
> Hi Florin,
> I have tried the latest master.The problem is not resolved.
> Here's the nginx error linformation:
>
> epoll_ctl:2203: ldp<269924>: epfd 33 ep_vlsh 1, fd 61 vlsh 29, op 1
> ldp_accept4:2043: ldp<269924>: listen fd 32: calling vppcom_session_accept: listen sid 0, ep 0x0, flags 0xffffdc50
> vppcom_session_accept:1521: vcl<269924:1>: listener 16777216 [0x0] accepted 30 [0x1e] peer: 192.168.3.66:47672 local: 192.168.3.65:8080
> epoll_ctl:2203: ldp<269924>: epfd 33 ep_vlsh 1, fd 62 vlsh 30, op 1
> ldp_accept4:2043: ldp<269924>: listen fd 32: calling vppcom_session_accept: listen sid 0, ep 0x0, flags 0xffffdc50
> vppcom_session_accept:1521: vcl<269924:1>: listener 16777216 [0x0] accepted 31 [0x1f] peer: 192.168.3.66:47674 local: 192.168.3.65:8080
> epoll_ctl:2203: ldp<269924>: epfd 33 ep_vlsh 1, fd 63 vlsh 31, op 1
> ldp_accept4:2043: ldp<269924>: listen fd 32: calling vppcom_session_accept: listen sid 0, ep 0x0, flags 0xffffdc50
> vcl_session_accepted_handler:377: vcl<269924:1>: ERROR: segment for session 32 couldn't be mounted!
> 2019/12/28 11:06:44 [error] 269924#0: accept4() failed (103: Software caused connection abort)
> ldp_accept4:2043: ldp<269924>: listen fd 32: calling vppcom_session_accept: listen sid 0, ep 0x0, flags 0xffffdc50
> vcl_session_accepted_handler:377: vcl<269924:1>: ERROR: segment for session 32 couldn't be mounted!
> 2019/12/28 11:06:44 [error] 269924#0: accept4() failed (103: Software caused connection abort)
> ldp_accept4:2043: ldp<269924>: listen fd 32: calling vppcom_session_accept: listen sid 0, ep 0x0, flags 0xffffdc50
> vcl_session_accepted_handler:377: vcl<269924:1>: ERROR: segment for session 32 couldn't be mounted!
> 2019/12/28 11:06:44 [error] 269924#0: accept4() failed (103: Software caused connection abort)
> ldp_accept4:2043: ldp<269924>: listen fd 32: calling vppcom_session_accept: listen sid 0, ep 0x0, flags 0xffffdc50
> vcl_session_accepted_handler:377: vcl<269924:1>: ERROR: segment for session 32 couldn't be mounted!
> 2019/12/28 11:06:44 [error] 269924#0: accept4() failed (103: Software caused connection abort)
> ldp_accept4:2043: ldp<269924>: listen fd 32: calling vppcom_session_accept: listen sid 0, ep 0x0, flags 0xffffdc50
> vcl_session_accepted_handler:377: vcl<269924:1>: ERROR: segment for session 32 couldn't be mounted!
> 2019/12/28 11:06:44 [error] 269924#0: accept4() failed (103: Software caused connection abort)
>
> Can you help me analyze it?
> Thanks,
> Yang.L

>


Re: #vpp-hoststack - Issue with UDP receiver application using VCL library #vpp-hoststack

Florin Coras
 

Hi Raj, 

Inline.

On Feb 12, 2020, at 12:58 PM, Raj Kumar <raj.gautam25@...> wrote:

Hi Florin,
Thanks for your explanation on the Tx buffer. Now , I have a better understanding how it works.
In my application, I have to encapsulate some type of Layer2 frame inside IPv6/UDP packet. I don't want VPP session layer to slice my frame into smaller packets, so I changed the UDP MSS/MTU to 9K in the VPP code. 
That's why I am using 9K size tx buffer.


FC: As long as that works, that’s fine. But make sure your app’s writes are also about the same size.

As suggested by you , I made the code changes to move the half open connection to a different thread.  Now the half open udp connections are going to the different threads in a round robin manner. 
I added following block of code at the end of the function session_stream_connect_notify_inline(). 
  
if(s->session_state  == SESSION_STATE_OPENED && tc->proto == TRANSPORT_PROTO_UDP)
{
    uc = udp_get_connection_from_transport (tc);
    if(uc->flags & UDP_CONN_F_CONNECTED)
    {
      if(s->thread_index != new_thread_index)
      {
        new_uc = udp_connection_clone_safe2(s->connection_index, s->thread_index, new_thread_index);
        session_dgram_connect_notify2(&new_uc->connection, s->thread_index, &s, new_thread_index);
      }
      if(vlib_num_workers())
        new_thread_index = new_thread_index % vlib_num_workers () + 1;
    }
}
With these changes , I am able to achieve ~50Gbps tx only throughput with my test application.  In the test setup, VPP is using 4 worker threads for transmission and test application is using 8K size buffer. There is one UDP session running on each worker.  There is no rx/tx drops on the interface while running this test.
Definitely, performance is improved after moving connections on different threads. Earlier I was getting ~22Gbps tx only throughput with 8K buffer. 

FC: Great!!


vpp# sh  session verbose
Thread 0: no sessions
Connection                                        State          Rx-f      Tx-f
[#1][U] fd0d:edc4:ffff:2002::213:27025->fd0d:edc4:-              0         3999999
Thread 1: active sessions 1
Connection                                        State          Rx-f      Tx-f
[#2][U] fd0d:edc4:ffff:2002::213:5564->fd0d:edc4:f-              0         3999999
Thread 2: active sessions 1
Connection                                        State          Rx-f      Tx-f
[#3][U] fd0d:edc4:ffff:2002::213:62955->fd0d:edc4:-              0         3999999
Thread 3: active sessions 1
Connection                                        State          Rx-f      Tx-f
[#4][U] fd0d:edc4:ffff:2002::213:4686->fd0d:edc4:f-              0         3999999
Thread 4: active sessions 1


There is one more point for the discussion. In my test application I am using  vppcom_select() to write the buffer. It works fine if application is single threaded .
But, when using vppcom_select()  inside  a thread ( in a multi-threaded application) , the application is crashing at FD_SET Macro.  

FC: VCL does not ensure thread safety within one worker. That is, there are no locks on a worker’s resources. If you need multi-threading either 1) register multiple workers (as you do lower) or 2) use vls (vcl locked sessions), this is used by ldp to interface with vcl. 


The reason, I found that vppcom is returning a large value (16777216)  as a session handle ( return value of vppcom_session_create()).
It uses the following logic to get the session handle - 
session handle = vcl_get_worker_index () << 24 | session_index

After, I changed this logic to 
session handle = vcl_get_worker_index () << 4 | session_index
 It is returning session handle 16 and there is no crash at FD_SET. 

But, still  vppcom_select() is getting timedout and returning 0.
Btw, in my application I am calling vppcom_app_create() from main and then vppcom_worker_register() from inside the threads.

FC: Instead of changing the session handle, check how select is implemented in ldp. In particular, check ldp_select_init_maps where the handles are converted to indices and then the indices are used to init the fdset. 


Here is the piece of code that works fine in a single threaded application but have issue with multi threaded application.
FD_ZERO(&writefds);
FD_SET(sockfd, &writefds);
 rv = vppcom_select (sockfd+1, NULL,  (unsigned long *) &writefds, NULL, 10);

FC: I guess sockfd here is in fact a vcl session handle. Use vppcom_session_index to convert the handle into an index and use that with FD_SET. 

Regards,
Florin

                     
Thanks,
-Raj


 





On Thu, Feb 6, 2020 at 6:16 PM Florin Coras <fcoras.lists@...> wrote:
Hi Raj,

Inline.

On Feb 6, 2020, at 2:07 PM, Raj Kumar <raj.gautam25@...> wrote:

 Hi Florin,
 For the UDP tx connection (half duplex ,originating from the host application) , I understood that until the receiver  send some "app level ack" it can not be assigned to a different worker thread. 
 Here the issue is that I can not change  the receiver application ( not owned by me).
 

FC: Okay. 

 In the VPP code I found that the UDP tx connections are being assigned to the first worker ( worker_0) in a hard coded manner. 
 thread_index = vlib_num_workers ()? 1 : 0;

FC: The session queue node does not run on main thread (unless programmed by some other infra). So, we ensure sessions are assigned to a valid vpp worker that reads evt msg from the app/vcl. 


Just curious to know, if a UDP half duplex connection can be assigned to the other worker thread.( other than worker_0). The best case would be if we can equally assign UDP connections to all the worker threads.
Please let me know if this approach can work or not. 

FC: The half-open api doesn’t allow the passing of a thread index, so session_open_vc (for udpc) would fail when calling transport_get_half_open if you change the thread. Having said that, to check if performance could be improved, you could maybe try to move the half-open udp connection to a different thread [1]. As a POC, since the opens are done with the worker barrier held, on the transport_half_open_has_fifos () branch, you could try to replace the half-open with a new udp connection, allocated on a different worker. 

[1] That’s what we do in udp46_input_inline for SESSION_STATE_OPENED and UDP_CONN_F_CONNECTED. But note that the apis assume the current thread (vlib_get_thread_index()) is the destination thread. 

  
Btw, now I am using VCL library in my real application. I am able to achieve 22 Gbps rx and tx UDP traffic on each NUMA node, a total of 44 Gbps rx and tx throughput on server. I am running 4 applications on each of  the NUMA node. The tx buffer size is 9K ( jumbo frame).  I did not understand how we can use large tx buffer ( > 9K) , can we add vector of packets in the send buffer? But, then how the packet boundary would be maintained.  Actually, I am not clear on this part. 

FC: I suspect there’s some misunderstanding here. There’s two things to consider:
- First, there’s a buffer (we call it a fifo) that vcl write to and the session layer reads from when it builds packets that are pushed over the network. VCL's writes to the fifo can be as big as the fifo (and tx fifo is configurable in vcl.conf).
- Second, the session layer builds packets that are mss/mtu sized. For udp, that mtu is currently not configurable and fixed at 1460B. So vcl's 9kB (or larger) writes to the fifo are cut into smaller packets. For tcp, mtu is configurable via a startup.conf stanza and if it’s larger than what our vlib buffers (default 2kB) then the session layer generates a chain of buffers. That is, a jumbo frame with only 1 udp and 1 ip header.  
 

 In VPP, I am using 3 worker threads and 2 rx queues. The rx queues are assigned to worker_1 and worker_2 so that worker_0 can be fully utilized for only transmission.

FC: It looks pretty good, but I guess you’d like to reach the 30Gbps rx/tx per numa you set out to achieve. Maybe try the hack above and see if it helps. 

By the way, how bad are the interfaces' rx/tx drops?

Regards,
Florin

 
vpp# sh int rx-placement
Thread 2 (vpp_wk_1):
  node dpdk-input:
    eth12 queue 0 (polling)
Thread 3 (vpp_wk_2):
  node dpdk-input:
    eth12 queue 1 (polling)

As expected, Tx connections are going on the same thread (worker_0).

vpp# sh session verbose 1
Connection                                        State          Rx-f      Tx-f
[#0][U] fd0d:edc4:ffff:2001::203:9915->:::0       -              0         0
[#0][U] fd0d:edc4:ffff:2001::223:9915->:::0       -              0         0
[#0][U] fd0d:edc4:ffff:2001::233:9915->:::0       -              0         0
[#0][U] fd0d:edc4:ffff:2001::243:9915->:::0       -              0         0
Thread 0: active sessions 4

Connection                                        State          Rx-f      Tx-f
[#1][U] fd0d:edc4:ffff:2002::213:20145->fd0d:edc4:-              0         30565
[#1][U] fd0d:edc4:ffff:2002::213:30812->fd0d:edc4:-              0         30565
[#1][U] fd0d:edc4:ffff:2002::213:47115->fd0d:edc4:-              0         67243
[#1][U] fd0d:edc4:ffff:2002::213:44526->fd0d:edc4:-              0         73356
Thread 1: active sessions 4

Connection                                        State          Rx-f      Tx-f
[#2][U] fd0d:edc4:ffff:2001::233:9915->fd0d:edc4:f-              0         0
[#2][U] fd0d:edc4:ffff:2001::243:9915->fd0d:edc4:f-              0         0
Thread 2: active sessions 2

Connection                                        State          Rx-f      Tx-f
[#3][U] fd0d:edc4:ffff:2001::203:9915->fd0d:edc4:f-              0         0
[#3][U] fd0d:edc4:ffff:2001::223:9915->fd0d:edc4:f-              0         0
Thread 3: active sessions 2
vpp#

thanks,
-Raj

   

On Tue, Jan 28, 2020 at 10:22 AM Raj Kumar via Lists.Fd.Io <raj.gautam25=gmail.com@...> wrote:
Hi Florin,
Sorry for the confusion. Accidentally, I was using blocking udp socket. With non-blocking socket, I do not see this issue. 

thanks,
-Raj

On Mon, Jan 27, 2020 at 11:57 PM Florin Coras <fcoras.lists@...> wrote:
Hi Raj, 

If that’s the case, read should return VPPCOM_EWOULDBLOCK if there’s nothing left to read. Where exactly does it block?

Regards,
Florin

On Jan 27, 2020, at 7:21 PM, Raj Kumar <raj.gautam25@...> wrote:

Hi Florin,
OK, I will try by adding some "app level ack" before starting  the traffic.
For the closing issue; Yes, I am using non blocking session in combination of epoll.  

thanks, 
-Raj

On Mon, Jan 27, 2020 at 9:35 PM Florin Coras <fcoras.lists@...> wrote:
Hi Raj, 

Ow, the sessions are not getting any return traffic so they cannot be migrated to their rss selected thread. So unless you send some type of “app level ack” from receiver to sender, before starting the transfer, results will probably not change wrt to vanilla udp. 

As for the closing issue, are you using non-blocking sessions in combination with epoll?

Regards,
Florin

On Jan 27, 2020, at 4:23 PM, Raj Kumar <raj.gautam25@...> wrote:

Hi Florin,
I tried udpc and now the rx connections(half duplex) are pinned to the vpp workers. However , all tx connections(half duplex) still go on the same thread.

vpp# sh session verbose 1
Connection                                        State          Rx-f      Tx-f
[#0][U] fd0d:edc4:ffff:2001::203:5555->:::0       -              0         0
[#0][U] fd0d:edc4:ffff:2001::203:6666->:::0       -              0         0
[#0][U] fd0d:edc4:ffff:2001::203:7777->:::0       -              0         0
Thread 0: active sessions 3

Connection                                        State          Rx-f      Tx-f
[#1][U] fd0d:edc4:ffff:2001::203:3261->fd0d:edc4:f-              0         3999999
Thread 1: active sessions 1
Thread 2: no sessions

Connection                                        State          Rx-f      Tx-f
[#3][U] fd0d:edc4:ffff:2001::203:5555->fd0d:edc4:f-              0         0
[#3][U] fd0d:edc4:ffff:2001::203:7777->fd0d:edc4:f-              0         0
Thread 3: active sessions 2

Connection                                        State          Rx-f      Tx-f
[#4][U] fd0d:edc4:ffff:2001::203:6666->fd0d:edc4:f-              0         0
Thread 4: active sessions 1

One issue I observed is that on stopping the traffic (from sender), the UDP rx application is blocking on the vppcom_session_read() API.

thanks,
-Raj

 

On Mon, Jan 27, 2020 at 4:02 PM Balaji Venkatraman (balajiv) <balajiv@...> wrote:

No intent to distract the context of the thread. But just the observation:

 

Vector rate is not high during running these tests.

 

 

Is interesting.

 

Always thought the drops at Rx/Tx had a direct correlation to the vector rate.., I see, it need not be true.

 

Thanks.

--

Regards,

Balaji. 

 

 

From: <vpp-dev@...> on behalf of Raj Kumar <raj.gautam25@...>
Date: Sunday, January 26, 2020 at 12:19 PM
To: Florin Coras <fcoras.lists@...>
Cc: vpp-dev <vpp-dev@...>
Subject: Re: [vpp-dev] #vpp-hoststack - Issue with UDP receiver application using VCL library

 

Hi Florin,

I was making some basic mistake. The UDP application was running on a different NUMA node . After I pinned my application to the same NUMA , the UDP tx throughput increased to 39 Gbps. With 2 application ( 2 UDP tx) , I am able to achieve 50 Gbps tx throughput. 

However, there are tx errors. The rx/tx descriptor is set to 2048, 2048 .

After changing the tx descriptor to 16,384 the tx errors stopped but tx throughput still stays at 50-51 Gbps. Vector rate is not high during running these tests.

 

But,  If I start rx and tx simultaneously then the total throughput is ~ 55 Gbps  ( 25 rx and 30 tx) and there are lot of  rx and tx drops on the interface. 

 

Now about udpc; my application has to receive/send UDP traffic from/to the other applications which are running on another servers on top of Linux kernel ( not using VPP). So, I can not use udpc.

 

All the tests I have executed so far , are between 2 servers. On one server VPP is installed and  applications are using VCL whereas peer applications are hosted on the Linux kernel on the another server.

 

 

With 1 connection (39 Gbps ) :- 

 

Thread 1 vpp_wk_0 (lcore 2)
Time 448.7, 10 sec internal node vector rate 3.97
  vector rates in 7.9647e5, out 7.9644e5, drop 2.4843e1, punt 0.0000e0
             Name                 State         Calls          Vectors        Suspends         Clocks       Vectors/Call
dpdk-input                       polling         298760628             251               0          1.49e8            0.00
drop                             active               2963           11147               0          1.12e2            3.76
error-drop                       active               2963           11147               0          6.92e1            3.76
eth12-output                     active           90055395       357362649               0          5.04e1            3.97
eth12-tx                         active           90055395       357362649               0          3.16e2            3.97
ethernet-input                   active                251             251               0          1.68e3            1.00
icmp6-neighbor-solicitation      active                 15              15               0          1.28e4            1.00
interface-output                 active                 15              15               0          3.16e3            1.00
ip6-drop                         active               2727           10911               0          3.46e1            4.00
ip6-glean                        active               2727           10911               0          6.87e1            4.00
ip6-icmp-input                   active                 15              15               0          1.43e3            1.00
ip6-input                        active                 15              15               0          2.03e3            1.00
ip6-local                        active                 15              15               0          2.17e3            1.00
ip6-lookup                       active           90058112       357373550               0          8.77e1            3.97
ip6-rewrite                      active           90055370       357362624               0          5.84e1            3.97
ip6-rewrite-mcast                active                 10              10               0          6.18e2            1.00
llc-input                        active                221             221               0          1.26e3            1.00
lldp-input                       active                 15              15               0          4.25e3            1.00
session-queue                    polling          90185400       357373535               0          2.21e3            3.96
unix-epoll-input                 polling            291483               0               0          1.45e3            0.00

 

With 2 connections ( 50 Gbps)

 

Thread 1 vpp_wk_0 (lcore 2)
Time 601.7, 10 sec internal node vector rate 7.94
  vector rates in 7.8457e5, out 7.8455e5, drop 2.1560e1, punt 0.0000e0
             Name                 State         Calls          Vectors        Suspends         Clocks       Vectors/Call
dpdk-input                       polling         250185974             338               0          1.04e8            0.00
drop                             active               3481           12973               0          1.16e2            3.73
error-drop                       active               3481           12973               0          6.96e1            3.73
eth12-output                     active           76700880       472068455               0          4.13e1            6.15
eth12-tx                         active           76700880       472068455               0          3.25e2            6.15
ethernet-input                   active                338             338               0          1.51e3            1.00
icmp6-neighbor-solicitation      active                 20              20               0          1.47e4            1.00
interface-output                 active                 20              20               0          2.58e3            1.00
ip6-drop                         active               3163           12655               0          5.35e1            4.00
ip6-glean                        active               3163           12655               0          8.08e1            4.00
ip6-icmp-input                   active                 20              20               0          1.32e3            1.00
ip6-input                        active                 20              20               0          1.45e3            1.00
ip6-local                        active                 20              20               0          2.08e3            1.00
ip6-lookup                       active           76704032       472081099               0          7.37e1            6.15
ip6-rewrite                      active           76700849       472068424               0          4.46e1            6.15
ip6-rewrite-mcast                active                 11              11               0          6.56e2            1.00
llc-input                        active                298             298               0          1.15e3            1.00
lldp-input                       active                 20              20               0          3.71e3            1.00
session-queue                    polling          76960608       472081079               0          2.33e3            6.13
unix-epoll-input                 polling            244093               0               0          1.45e3            0.00

 

 

thanks,

-Raj

 

On Fri, Jan 24, 2020 at 9:05 PM Florin Coras <fcoras.lists@...> wrote:

Hi Raj, 

 

Inline.



On Jan 24, 2020, at 4:41 PM, Raj Kumar <raj.gautam25@...> wrote:

 

Hi Florin,

After fixing the UDP checksum offload issue and using the 64K tx buffer, I am able to send 35Gbps ( half duplex) . 

In DPDK code ( ./plugins/dpdk/device/init.c) , it was not setting the DEV_TX_OFFLOAD_TCP_CKSUM and DEV_TX_OFFLOAD_UDP_CKSUM offload bit for MLNX5 PMD.

 In the udp tx application I am using vppcom_session_write to write to the session and write len is same as the buffer size ( 64K).  

 

 

FC: Okay. 



Btw, I run all the tests with the patch https://gerrit.fd.io/r/c/vpp/+/24462 you provided.

 

If I run a single UDP tx connection then the throughput is 35 Gbps.

 

FC: Since your nics are 100Gbps, I’m curious why you’re not seeing more. What sort of vector rates do you see with “show run”, to be specific, are they higher than 40? If you increase writes to 128kB do you see throughput increase. I know that’s no ideal, more lower. 



But, on starting other UDP rx connections (20 Gbps) the tx throughput goes down to 12Gbps.

Even , if I run 2 UDP tx connection then also I am not able to scale up the throughput.  The overall throughput stays the same.

First I tried this test with 4 worker threads and then with 1 worker thread. 

 

FC: Any rx/tx errors on the interfaces? Because udp is not flow controlled you may receive/send bursts of packets that are larger than your rx/tx descriptor rings. How busy are your nodes in show run?

 

If you’re seeing a lot of rx/tx errors, you might try to increase the number of rx/tx descriptors per interface queue. 



 

 I have following 2 points -

1) With my udp tx test application, I am getting this throughput after using 64K tx buffer. But , in actual product I have to send the variable size UDP packets ( max len 9000 bytes) . That mean the maximum tx buffer size would be 9K

 

FC: This dgram limitation in the session layer can be optimized. That is, we could could read beyond the first dgram to see how much data is really available. Previously for this sort of scenarios we used udp in stream mode, but we don’t support that now. We might in the future, but that’s a longer story ...



and with that buffer size  I am getting 15Gbps which is fine if I can some how scale up it by running multiple applications. But, that does not seems to work with UDP ( I am not using udpc). 

 

FC: With udp you might not be able to scale throughput because of the first worker pinning of connections. So you might want to try udpc. 

 

Also, whenever the session layer schedules a connection for sending, it only allows that connection to burst a maximum of 42 packets. As you add more connections, the number of packets/dispatch grows to a max of 256 (vnet frame size). As the burst/dispatch grows, you’re putting more stress on the nic’s tx queue, so the chance of tx drops increases. 



 

2)  My target is the achieve at least 30 Gbps rx and 30 Gbps tx UDP throughput on one  NUMA node.  I tried by running the multiple VPP instances on VFs ( SR-IOV) and I can scale up the throughput ( rx and tx)  with the number of VPP instances. 

Here is the throughput test with VF - 

1 VPP instance  ( 15Gbps  rx and 15Gbps  tx)

2 VPP instances  ( 30Gbps rx and 30 Ghps tx)  

3 VPP instances  ( 45 Gbps rx and 35Gbps tx) 

 

FC: This starts to be complicated because of what I mentioned above and, if you use udpc, because of how rss distributes those flows over workers. 

 

Still, with udpc, if “you’re lucky” you might get the rx and tx flows on different workers, so you might be able to get 30Gbps on each per numa. You could then start 2 vpp instances, each with workers pinned to a numa. 

 

Did you make sure that your apps run on the right numa with something like taskset? It matters becase data is copied into and out of the fifos shared by vcl and vpp workers. Also, currently the segment manager (part of session layer) is not numa aware, so you may want to avoid running vpp with workers spread over multiple numas. 



 

I have 2 NUMA node on the serer so I am expecting to get 60 Gbps rx and 60 Gbps rx total throughput.

 

Btw, I also tested TCP without VF. It seems to scale up properly as the connections are going on different threads.

 

FC: You should get the same type of connection distribution with udpc. Note that it’s not necessarily ideal, for instance, in the last scenario the distribution is 3, 4, 2, 3. 

 

The main differences with tcp are 1) lack of flow control and 2) dgram mode (that leads to the short write throughput limitation). If you can, give udpc a try. 

 

By the way, 18Gbps for one connection seems rather low, if this is measured between 2 iperfs running through vpp’s host stack. Even with 1.5k mtu you should get much more. Also, if you’re looking at maximizing tcp throughput, you might want to check tso as well.

 

Regards,

Florin


 

vpp# sh thread

ID     Name                Type        LWP     Sched Policy (Priority)  lcore  Core   Socket State

0      vpp_main                        22181   other (0)                1      0      0

1      vpp_wk_0            workers     22183   other (0)                2      2      0

2      vpp_wk_1            workers     22184   other (0)                3      3      0

3      vpp_wk_2            workers     22185   other (0)                4      4      0

4      vpp_wk_3            workers     22186   other (0)                5      8      0

 

4 worker threads

Iperf3 TCP tests  - 8000 bytes packets

1 Connection:

Rx only

18 Gbps

vpp# sh session verbose 1

Connection                                        State          Rx-f      Tx-f

[0:0][T] fd0d:edc4:ffff:2001::203:6669->:::0      LISTEN         0         0

Thread 0: active sessions 1

 

Connection                                        State          Rx-f      Tx-f

[1:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

Thread 1: active sessions 1

Thread 2: no sessions

Thread 3: no sessions

 

Connection                                        State          Rx-f      Tx-f

[4:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

Thread 4: active sessions 1

 

2 connections:

 

Rx only

32Gbps

vpp# sh session verbose 1

Connection                                        State          Rx-f      Tx-f

[0:0][T] fd0d:edc4:ffff:2001::203:6669->:::0      LISTEN         0         0

[0:1][T] fd0d:edc4:ffff:2001::203:6679->:::0      LISTEN         0         0

Thread 0: active sessions 2

 

Connection                                        State          Rx-f      Tx-f

[1:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

Thread 1: active sessions 1

Thread 2: no sessions

Thread 3: no sessions

 

Connection                                        State          Rx-f      Tx-f

[4:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

[4:1][T] fd0d:edc4:ffff:2001::203:6679->fd0d:edc4:ESTABLISHED    0         0

[4:2][T] fd0d:edc4:ffff:2001::203:6679->fd0d:edc4:ESTABLISHED    0         0

Thread 4: active sessions 3

3 connection

Rx only

43Gbps

vpp# sh session verbose 1

Connection                                        State          Rx-f      Tx-f

[0:0][T] fd0d:edc4:ffff:2001::203:6669->:::0      LISTEN         0         0

[0:1][T] fd0d:edc4:ffff:2001::203:6679->:::0      LISTEN         0         0

[0:2][T] fd0d:edc4:ffff:2001::203:6689->:::0      LISTEN         0         0

Thread 0: active sessions 3

 

Connection                                        State          Rx-f      Tx-f

[1:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

Thread 1: active sessions 1

Thread 2: no sessions

 

Connection                                        State          Rx-f      Tx-f

[3:0][T] fd0d:edc4:ffff:2001::203:6689->fd0d:edc4:ESTABLISHED    0         0

Thread 3: active sessions 1

 

Connection                                        State          Rx-f      Tx-f

[4:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

[4:1][T] fd0d:edc4:ffff:2001::203:6679->fd0d:edc4:ESTABLISHED    0         0

[4:2][T] fd0d:edc4:ffff:2001::203:6679->fd0d:edc4:ESTABLISHED    0         0

[4:3][T] fd0d:edc4:ffff:2001::203:6689->fd0d:edc4:ESTABLISHED    0         0

Thread 4: active sessions 4

2 connection

1 -Rx

1 -Tx

Rx – 18Gbps

Tx – 12 Gbps

vpp# sh session verbose 1

Connection                                        State          Rx-f      Tx-f

[0:0][T] fd0d:edc4:ffff:2001::203:6669->:::0      LISTEN         0         0

Thread 0: active sessions 1

 

Connection                                        State          Rx-f      Tx-f

[1:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

Thread 1: active sessions 1

Thread 2: no sessions

 

Connection                                        State          Rx-f      Tx-f

[3:0][T] fd0d:edc4:ffff:2001::203:10376->fd0d:edc4ESTABLISHED    0         0

[3:1][T] fd0d:edc4:ffff:2001::203:12871->fd0d:edc4ESTABLISHED    0         3999999

Thread 3: active sessions 2

 

Connection                                        State          Rx-f      Tx-f

[4:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

Thread 4: active sessions 1

4 connections

2 – Rx

2-Tx

Rx – 27 Gbps

Tx – 24 Gbps

vpp# sh session verbose 1

Connection                                        State          Rx-f      Tx-f

[0:0][T] fd0d:edc4:ffff:2001::203:6669->:::0      LISTEN         0         0

[0:1][T] fd0d:edc4:ffff:2001::203:6689->:::0      LISTEN         0         0

Thread 0: active sessions 2

 

Connection                                        State          Rx-f      Tx-f

[1:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

Thread 1: active sessions 1

 

Connection                                        State          Rx-f      Tx-f

[2:0][T] fd0d:edc4:ffff:2001::203:51962->fd0d:edc4ESTABLISHED    0         0

[2:1][T] fd0d:edc4:ffff:2001::203:56849->fd0d:edc4ESTABLISHED    0         3999999

[2:2][T] fd0d:edc4:ffff:2001::203:6689->fd0d:edc4:ESTABLISHED    0         0

Thread 2: active sessions 3

 

Connection                                        State          Rx-f      Tx-f

[3:1][T] fd0d:edc4:ffff:2001::203:6689->fd0d:edc4:ESTABLISHED    0         0

Thread 3: active sessions 1

 

Connection                                        State          Rx-f      Tx-f

[4:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

[4:1][T] fd0d:edc4:ffff:2001::203:57550->fd0d:edc4ESTABLISHED    0         3999999

[4:2][T] fd0d:edc4:ffff:2001::203:56939->fd0d:edc4ESTABLISHED    0         0

Thread 4: active sessions 3

5 connections

2 – Rx

3 -Tx

Rx – 27 Gbps

Tx – 28 Gbps

vpp# sh session verbose 1

Connection                                        State          Rx-f      Tx-f

[0:0][T] fd0d:edc4:ffff:2001::203:6669->:::0      LISTEN         0         0

[0:1][T] fd0d:edc4:ffff:2001::203:6689->:::0      LISTEN         0         0

[0:2][T] fd0d:edc4:ffff:2001::203:7729->:::0      LISTEN         0         0

Thread 0: active sessions 3

 

Connection                                        State          Rx-f      Tx-f

[1:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

[1:2][T] fd0d:edc4:ffff:2001::203:39216->fd0d:edc4ESTABLISHED    0         3999999

Thread 1: active sessions 2

 

Connection                                        State          Rx-f      Tx-f

[2:0][T] fd0d:edc4:ffff:2001::203:51962->fd0d:edc4ESTABLISHED    0         0

[2:1][T] fd0d:edc4:ffff:2001::203:56849->fd0d:edc4ESTABLISHED    0         3999999

[2:2][T] fd0d:edc4:ffff:2001::203:6689->fd0d:edc4:ESTABLISHED    0         0

Thread 2: active sessions 3

 

Connection                                        State          Rx-f      Tx-f

[3:0][T] fd0d:edc4:ffff:2001::203:29141->fd0d:edc4ESTABLISHED    0         0

[3:1][T] fd0d:edc4:ffff:2001::203:6689->fd0d:edc4:ESTABLISHED    0         0

Thread 3: active sessions 2

 

Connection                                        State          Rx-f      Tx-f

[4:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

[4:1][T] fd0d:edc4:ffff:2001::203:57550->fd0d:edc4ESTABLISHED    0         3999999

[4:2][T] fd0d:edc4:ffff:2001::203:56939->fd0d:edc4ESTABLISHED    0         0

Thread 4: active sessions 3

6 connection

3 – Rx

3 – Tx

Rx – 41 Gbps

Tx – 13 Gbps

vpp# sh session verbose 1

Connection                                        State          Rx-f      Tx-f

[0:0][T] fd0d:edc4:ffff:2001::203:6669->:::0      LISTEN         0         0

[0:1][T] fd0d:edc4:ffff:2001::203:6689->:::0      LISTEN         0         0

[0:2][T] fd0d:edc4:ffff:2001::203:7729->:::0      LISTEN         0         0

Thread 0: active sessions 3

 

Connection                                        State          Rx-f      Tx-f

[1:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

[1:1][T] fd0d:edc4:ffff:2001::203:7729->fd0d:edc4:ESTABLISHED    0         0

[1:2][T] fd0d:edc4:ffff:2001::203:39216->fd0d:edc4ESTABLISHED    0         3999999

Thread 1: active sessions 3

 

Connection                                        State          Rx-f      Tx-f

[2:0][T] fd0d:edc4:ffff:2001::203:51962->fd0d:edc4ESTABLISHED    0         0

[2:1][T] fd0d:edc4:ffff:2001::203:56849->fd0d:edc4ESTABLISHED    0         3999999

[2:2][T] fd0d:edc4:ffff:2001::203:6689->fd0d:edc4:ESTABLISHED    0         0

[2:3][T] fd0d:edc4:ffff:2001::203:7729->fd0d:edc4:ESTABLISHED    0         0

Thread 2: active sessions 4

 

Connection                                        State          Rx-f      Tx-f

[3:0][T] fd0d:edc4:ffff:2001::203:29141->fd0d:edc4ESTABLISHED    0         0

[3:1][T] fd0d:edc4:ffff:2001::203:6689->fd0d:edc4:ESTABLISHED    0         0

Thread 3: active sessions 2

 

Connection                                        State          Rx-f      Tx-f

[4:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

[4:1][T] fd0d:edc4:ffff:2001::203:57550->fd0d:edc4ESTABLISHED    0         3999999

[4:2][T] fd0d:edc4:ffff:2001::203:56939->fd0d:edc4ESTABLISHED    0         0

Thread 4: active sessions 3

 

 

thanks,

-Raj

 

    

 

 

 

 

 

 

 

 

 

On Tue, Jan 21, 2020 at 9:43 PM Florin Coras <fcoras.lists@...> wrote:

Hi Raj, 

 

Inline.



On Jan 21, 2020, at 3:41 PM, Raj Kumar <raj.gautam25@...> wrote:

 

Hi Florin,

There is no drop on the interfaces. It is 100G card. 

In UDP tx application, I am using 1460 bytes of buffer to send on select(). I am getting 5 Gbps throughput  ,but if I start one more application then total throughput goes down to 4 Gbps as both the sessions are on the same thread.   

I increased the tx buffer to 8192 bytes and then I can get 11 Gbps throughput  but again if I start one more application the throughput goes down to 10 Gbps.

 

FC: I assume you’re using vppcom_session_write to write to the session. How large is “len” typically? See lower on why that matters.

 

 

I found one issue in the code ( You must be aware of that) , the UDP send MSS is hard-coded to 1460 ( /vpp/src/vnet/udp/udp.c file). So, the large packets  are getting fragmented. 

udp_send_mss (transport_connection_t * t)
{
  /* TODO figure out MTU of output interface */
  return 1460;
}

 

FC: That’s a typical mss and actually what tcp uses as well. Given the nics, they should be fine sending a decent number of mpps without the need to do jumbo ip datagrams. 



if I change the MSS to 8192 then I am getting 17 Mbps throughput. But , if i start one more application then throughput is going down to 13 Mbps. 

 

It looks like the 17 Mbps is per core limit and since all the sessions are pined to the same thread we can not get more throughput.  Here, per core throughput look good to me. Please let me know there is any way to use multiple threads for UDP tx applications. 

 

In your previous email you mentioned that we can use connected udp socket in the UDP receiver. Can we do something similar for UDP tx ?

 

FC: I think it may work fine if vpp has main + 1 worker. I have a draft patch here [1] that seems to work with multiple workers but it’s not heavily tested. 

 

Out of curiosity, I ran a vcl_test_client/server test with 1 worker and with XL710s, I’m seeing this:

 

CLIENT RESULTS: Streamed 65536017791 bytes
  in 14.392678 seconds (36.427420 Gbps half-duplex)!

 

Should be noted that because of how datagrams are handled in the session layer, throughput is sensitive to write sizes. I ran the client like:

~/vcl_client -p udpc 6.0.1.2 1234 -U -N 1000000 -T 65536

 

Or in english, unidirectional test, tx buffer of 64kB and 1M writes of that buffer. My vcl config was such that tx fifos were 4MB and rx fifos 2MB. The sender had few tx packet drops (1657) and the receiver few rx packet drops (801). If you plan to use it, make sure arp entries are first resolved (e.g., use ping) otherwise the first packet is lost. 

 

Throughput drops to ~15Gbps with 8kB writes. You should probably also test with bigger writes with udp. 

 



 

From the hardware stats , it seems that UDP tx checksum offload is not enabled/active  which could impact the performance. I think, udp tx checksum should be enabled by default if it is not disabled using parameter  "no-tx-checksum-offload".

 

FC: Performance might be affected by the limited number of offloads available. Here’s what I see on my XL710s:

 

rx offload active: ipv4-cksum jumbo-frame scatter

tx offload active: udp-cksum tcp-cksum multi-segs

 

 

Ethernet address b8:83:03:79:af:8c
  Mellanox ConnectX-4 Family
    carrier up full duplex mtu 9206
    flags: admin-up pmd maybe-multiseg subif rx-ip4-cksum
    rx: queues 5 (max 65535), desc 1024 (min 0 max 65535 align 1)

 

FC: Are you running with 5 vpp workers? 

 

Regards,

Florin



    tx: queues 6 (max 65535), desc 1024 (min 0 max 65535 align 1)
    pci: device 15b3:1017 subsystem 1590:0246 address 0000:12:00.00 numa 0
    max rx packet len: 65536
    promiscuous: unicast off all-multicast on
    vlan offload: strip off filter off qinq off
    rx offload avail:  vlan-strip ipv4-cksum udp-cksum tcp-cksum vlan-filter
                       jumbo-frame scatter timestamp keep-crc
    rx offload active: ipv4-cksum jumbo-frame scatter
    tx offload avail:  vlan-insert ipv4-cksum udp-cksum tcp-cksum tcp-tso
                       outer-ipv4-cksum vxlan-tnl-tso gre-tnl-tso multi-segs
                       udp-tnl-tso ip-tnl-tso
    tx offload active: multi-segs
    rss avail:         ipv4-frag ipv4-tcp ipv4-udp ipv4-other ipv4 ipv6-tcp-ex
                       ipv6-udp-ex ipv6-frag ipv6-tcp ipv6-udp ipv6-other
                       ipv6-ex ipv6
    rss active:        ipv4-frag ipv4-tcp ipv4-udp ipv4-other ipv4 ipv6-tcp-ex
                       ipv6-udp-ex ipv6-frag ipv6-tcp ipv6-udp ipv6-other
                       ipv6-ex ipv6
    tx burst function: (nil)
    rx burst function: mlx5_rx_burst

 

thanks,

-Raj

 

On Mon, Jan 20, 2020 at 7:55 PM Florin Coras <fcoras.lists@...> wrote:

Hi Raj, 

 

Good to see progress. Check with “show int” the tx counters on the sender and rx counters on the receiver as the interfaces might be dropping traffic. One sender should be able to do more than 5Gbps. 

 

How big are the writes to the tx fifo? Make sure the tx buffer is some tens of kB. 

 

As for the issue with the number of workers, you’ll have to switch to udpc (connected udp), to ensure you have a separate connection for each ‘flow’, and to use accept in combination with epoll to accept the sessions udpc creates. 

 

Note that udpc currently does not work correctly with vcl and multiple vpp workers if vcl is the sender (not the receiver) and traffic is bidirectional. The sessions are all created on the first thread and once return traffic is received, they’re migrated to the thread selected by RSS hashing. VCL is not notified when that happens and it runs out of sync. You might not be affected by this, as you’re not receiving any return traffic, but because of that all sessions may end up stuck on the first thread. 

 

For udp transport, the listener is connection-less and bound to the main thread. As a result, all incoming packets, even if they pertain to multiple flows, are written to the listener’s buffer/fifo.  

 

Regards,

Florin



On Jan 20, 2020, at 3:50 PM, Raj Kumar <raj.gautam25@...> wrote:

 

Hi Florin,

I changed my application as you suggested. Now, I am able to achieve 5 Gbps with a single UDP stream.  Overall, I can get ~20Gbps with multiple host application . Also, the TCP throughput  is improved to ~28Gbps after tuning as mentioned in  [1]. 

On the similar topic; the UDP tx throughput is throttled to 5Gbps. Even if I run the multiple host applications the overall throughput is 5Gbps. I also tried by configuring multiple worker threads . But the problem is that all the application sessions are assigned to the same worker thread. Is there any way to assign each session  to a different worker thread?  

 

vpp# sh session verbose 2
Thread 0: no sessions
[#1][U] fd0d:edc4:ffff:2001::203:58926->fd0d:edc4:
 Rx fifo: cursize 0 nitems 3999999 has_event 0
          head 0 tail 0 segment manager 1
          vpp session 0 thread 1 app session 0 thread 0
          ooo pool 0 active elts newest 0
 Tx fifo: cursize 3999999 nitems 3999999 has_event 1
          head 1460553 tail 1460552 segment manager 1
          vpp session 0 thread 1 app session 0 thread 0
          ooo pool 0 active elts newest 4294967295
 session: state: opened opaque: 0x0 flags:
[#1][U] fd0d:edc4:ffff:2001::203:63413->fd0d:edc4:
 Rx fifo: cursize 0 nitems 3999999 has_event 0
          head 0 tail 0 segment manager 2
          vpp session 1 thread 1 app session 0 thread 0
          ooo pool 0 active elts newest 0
 Tx fifo: cursize 3999999 nitems 3999999 has_event 1
          head 3965434 tail 3965433 segment manager 2
          vpp session 1 thread 1 app session 0 thread 0
          ooo pool 0 active elts newest 4294967295
 session: state: opened opaque: 0x0 flags:
Thread 1: active sessions 2
Thread 2: no sessions
Thread 3: no sessions
Thread 4: no sessions
Thread 5: no sessions
Thread 6: no sessions
Thread 7: no sessions
vpp# sh app client
Connection                              App
[#1][U] fd0d:edc4:ffff:2001::203:58926->udp6_tx_8092[shm]
[#1][U] fd0d:edc4:ffff:2001::203:63413->udp6_tx_8093[shm]
vpp#

 

 

 

thanks,

-Raj

 

On Sun, Jan 19, 2020 at 8:50 PM Florin Coras <fcoras.lists@...> wrote:

Hi Raj,

 

The function used for receiving datagrams is limited to reading at most the length of a datagram from the rx fifo. UDP datagrams are mtu sized, so your reads are probably limited to ~1.5kB. On each epoll rx event try reading from the session handle in a while loop until you get an VPPCOM_EWOULDBLOCK. That might improve performance. 

 

Having said that, udp is lossy so unless you implement your own congestion/flow control algorithms, the data you’ll receive might be full of “holes”. What are the rx/tx error counters on your interfaces (check with “sh int”). 

 

Also, with simple tuning like this [1], you should be able to achieve much more than 15Gbps with tcp. 

 

Regards,

Florin

 



On Jan 19, 2020, at 3:25 PM, Raj Kumar <raj.gautam25@...> wrote:

 

  Hi Florin,

 By using VCL library in an UDP receiver application,  I am able to receive only 2 Mbps traffic. On increasing the traffic, I see Rx FIFO full error and application stopped receiving the traffic from the session layer.  Whereas, with TCP I can easily achieve 15Gbps throughput without tuning any DPDK parameter.  UDP tx also looks fine. From an host application I can send ~5Gbps without any issue. 

 

I am running VPP( stable/2001 code) on RHEL8 server using Mellanox 100G (MLNX5) adapters.

Please advise if I can use VCL library to receive high throughput UDP traffic ( in Gbps). I would be running multiple instances of host application to receive data ( ~50-60 Gbps).

 

I also tried by increasing the Rx FIFO size to 16MB but did not help much. The host application is just throwing the received packets , it is not doing any packet processing.

 

[root@orc01 vcl_test]# VCL_DEBUG=2 ./udp6_server_vcl
VCL<20201>: configured VCL debug level (2) from VCL_DEBUG!
VCL<20201>: allocated VCL heap = 0x7f39a17ab010, size 268435456 (0x10000000)
VCL<20201>: configured rx_fifo_size 4000000 (0x3d0900)
VCL<20201>: configured tx_fifo_size 4000000 (0x3d0900)
VCL<20201>: configured app_scope_local (1)
VCL<20201>: configured app_scope_global (1)
VCL<20201>: configured api-socket-name (/tmp/vpp-api.sock)
VCL<20201>: completed parsing vppcom config!
vppcom_connect_to_vpp:480: vcl<20201:0>: app (udp6_server) is connected to VPP!
vppcom_app_create:1104: vcl<20201:0>: sending session enable
vppcom_app_create:1112: vcl<20201:0>: sending app attach
vppcom_app_create:1121: vcl<20201:0>: app_name 'udp6_server', my_client_index 256 (0x100)
vppcom_epoll_create:2439: vcl<20201:0>: Created vep_idx 0
vppcom_session_create:1179: vcl<20201:0>: created session 1
vppcom_session_bind:1317: vcl<20201:0>: session 1 handle 1: binding to local IPv6 address fd0d:edc4:ffff:2001::203 port 8092, proto UDP
vppcom_session_listen:1349: vcl<20201:0>: session 1: sending vpp listen request...
vcl_session_bound_handler:604: vcl<20201:0>: session 1 [0x1]: listen succeeded!
vppcom_epoll_ctl:2541: vcl<20201:0>: EPOLL_CTL_ADD: vep_sh 0, sh 1, events 0x1, data 0x1!
vppcom_session_create:1179: vcl<20201:0>: created session 2
vppcom_session_bind:1317: vcl<20201:0>: session 2 handle 2: binding to local IPv6 address fd0d:edc4:ffff:2001::203 port 8093, proto UDP
vppcom_session_listen:1349: vcl<20201:0>: session 2: sending vpp listen request...
vcl_session_app_add_segment_handler:765: vcl<20201:0>: mapped new segment '20190-2' size 134217728
vcl_session_bound_handler:604: vcl<20201:0>: session 2 [0x2]: listen succeeded!
vppcom_epoll_ctl:2541: vcl<20201:0>: EPOLL_CTL_ADD: vep_sh 0, sh 2, events 0x1, data 0x2!

 

 

vpp# sh session verbose 2
[#0][U] fd0d:edc4:ffff:2001::203:8092->:::0

 Rx fifo: cursize 3999125 nitems 3999999 has_event 1
          head 2554045 tail 2553170 segment manager 1
          vpp session 0 thread 0 app session 1 thread 0
          ooo pool 0 active elts newest 4294967295
 Tx fifo: cursize 0 nitems 3999999 has_event 0
          head 0 tail 0 segment manager 1
          vpp session 0 thread 0 app session 1 thread 0
          ooo pool 0 active elts newest 0
[#0][U] fd0d:edc4:ffff:2001::203:8093->:::0

 Rx fifo: cursize 0 nitems 3999999 has_event 0
          head 0 tail 0 segment manager 2
          vpp session 1 thread 0 app session 2 thread 0
          ooo pool 0 active elts newest 0
 Tx fifo: cursize 0 nitems 3999999 has_event 0
          head 0 tail 0 segment manager 2
          vpp session 1 thread 0 app session 2 thread 0
          ooo pool 0 active elts newest 0
Thread 0: active sessions 2

 

[root@orc01 vcl_test]# cat /etc/vpp/vcl.conf
vcl {
  rx-fifo-size 4000000
  tx-fifo-size 4000000
  app-scope-local
  app-scope-global
  api-socket-name /tmp/vpp-api.sock
}
[root@orc01 vcl_test]#

 

------------------- Start of thread 0 vpp_main -------------------
Packet 1

00:09:53:445025: dpdk-input
  HundredGigabitEthernet12/0/0 rx queue 0
  buffer 0x88078: current data 0, length 1516, buffer-pool 0, ref-count 1, totlen-nifb 0, trace handle 0x0
                  ext-hdr-valid
                  l4-cksum-computed l4-cksum-correct
  PKT MBUF: port 0, nb_segs 1, pkt_len 1516
    buf_len 2176, data_len 1516, ol_flags 0x180, data_off 128, phys_addr 0x75601e80
    packet_type 0x2e1 l2_len 0 l3_len 0 outer_l2_len 0 outer_l3_len 0
    rss 0x0 fdir.hi 0x0 fdir.lo 0x0
    Packet Offload Flags
      PKT_RX_IP_CKSUM_GOOD (0x0080) IP cksum of RX pkt. is valid
      PKT_RX_L4_CKSUM_GOOD (0x0100) L4 cksum of RX pkt. is valid
    Packet Types
      RTE_PTYPE_L2_ETHER (0x0001) Ethernet packet
      RTE_PTYPE_L3_IPV6_EXT_UNKNOWN (0x00e0) IPv6 packet with or without extension headers
      RTE_PTYPE_L4_UDP (0x0200) UDP packet
  IP6: b8:83:03:79:9f:e4 -> b8:83:03:79:af:8c 802.1q vlan 2001
  UDP: fd0d:edc4:ffff:2001::201 -> fd0d:edc4:ffff:2001::203
    tos 0x00, flow label 0x0, hop limit 64, payload length 1458
  UDP: 56944 -> 8092
    length 1458, checksum 0xb22d
00:09:53:445028: ethernet-input
  frame: flags 0x3, hw-if-index 2, sw-if-index 2
  IP6: b8:83:03:79:9f:e4 -> b8:83:03:79:af:8c 802.1q vlan 2001
00:09:53:445029: ip6-input
  UDP: fd0d:edc4:ffff:2001::201 -> fd0d:edc4:ffff:2001::203
    tos 0x00, flow label 0x0, hop limit 64, payload length 1458
  UDP: 56944 -> 8092
    length 1458, checksum 0xb22d
00:09:53:445031: ip6-lookup
  fib 0 dpo-idx 6 flow hash: 0x00000000
  UDP: fd0d:edc4:ffff:2001::201 -> fd0d:edc4:ffff:2001::203
    tos 0x00, flow label 0x0, hop limit 64, payload length 1458
  UDP: 56944 -> 8092
    length 1458, checksum 0xb22d
00:09:53:445032: ip6-local
    UDP: fd0d:edc4:ffff:2001::201 -> fd0d:edc4:ffff:2001::203
      tos 0x00, flow label 0x0, hop limit 64, payload length 1458
    UDP: 56944 -> 8092
      length 1458, checksum 0xb22d
00:09:53:445032: ip6-udp-lookup
  UDP: src-port 56944 dst-port 8092
00:09:53:445033: udp6-input
  UDP_INPUT: connection 0, disposition 5, thread 0

 

 

thanks,

-Raj

 

 

On Wed, Jan 15, 2020 at 4:09 PM Raj Kumar via Lists.Fd.Io <raj.gautam25=gmail.com@...> wrote:

Hi Florin,

Yes,  [2] patch resolved the  IPv6/UDP receiver issue. 

Thanks! for your help.

 

thanks,

-Raj

 

On Tue, Jan 14, 2020 at 9:35 PM Florin Coras <fcoras.lists@...> wrote:

Hi Raj, 

 

First of all, with this [1], the vcl test app/client can establish a udpc connection. Note that udp will most probably lose packets, so large exchanges with those apps may not work. 

 

As for the second issue, does [2] solve it?

 

Regards, 

Florin

 

[2] https://gerrit.fd.io/r/c/vpp/+/24334



On Jan 14, 2020, at 12:59 PM, Raj Kumar <raj.gautam25@...> wrote:

 

Hi Florin,

Thanks! for the reply. 

 

I realized the issue with the non-connected case.  For receiving datagrams, I was using recvfrom() with DONOT_WAIT flag because of that  vppcom_session_recvfrom() api was failing. It expects either 0 or MSG_PEEK flag.

  if (flags == 0)
    rv = vppcom_session_read (session_handle, buffer, buflen);
  else if (flags & MSG_PEEK) 0x2
    rv = vppcom_session_peek (session_handle, buffer, buflen);
  else
    {
      VDBG (0, "Unsupport flags for recvfrom %d", flags);
      return VPPCOM_EAFNOSUPPORT;
    }

 

 I changed the flag to 0 in recvfrom() , after that UDP rx is working fine but only for IPv4.

 

I am facing a different issue with IPv6/UDP receiver.  I am getting "no listener for dst port" error.

 

Please let me know if I am doing something wrong. 

Here are the traces : -

 

[root@orc01 testcode]# VCL_DEBUG=2 LDP_DEBUG=2 LD_PRELOAD=/opt/vpp/build-root/install-vpp-native/vpp/lib/libvcl_ldpreload.so  VCL_CONFIG=/etc/vpp/vcl.cfg ./udp6_rx
VCL<1164>: configured VCL debug level (2) from VCL_DEBUG!
VCL<1164>: allocated VCL heap = 0x7ff877439010, size 268435456 (0x10000000)
VCL<1164>: configured rx_fifo_size 4000000 (0x3d0900)
VCL<1164>: configured tx_fifo_size 4000000 (0x3d0900)
VCL<1164>: configured app_scope_local (1)
VCL<1164>: configured app_scope_global (1)
VCL<1164>: configured api-socket-name (/tmp/vpp-api.sock)
VCL<1164>: completed parsing vppcom config!
vppcom_connect_to_vpp:549: vcl<1164:0>: app (ldp-1164-app) is connected to VPP!
vppcom_app_create:1067: vcl<1164:0>: sending session enable
vppcom_app_create:1075: vcl<1164:0>: sending app attach
vppcom_app_create:1084: vcl<1164:0>: app_name 'ldp-1164-app', my_client_index 0 (0x0)
ldp_init:209: ldp<1164>: configured LDP debug level (2) from env var LDP_DEBUG!
ldp_init:282: ldp<1164>: LDP initialization: done!
ldp_constructor:2490: LDP<1164>: LDP constructor: done!
socket:974: ldp<1164>: calling vls_create: proto 1 (UDP), is_nonblocking 0
vppcom_session_create:1142: vcl<1164:0>: created session 0
bind:1086: ldp<1164>: fd 32: calling vls_bind: vlsh 0, addr 0x7fff9a93efe0, len 28
vppcom_session_bind:1280: vcl<1164:0>: session 0 handle 0: binding to local IPv6 address :: port 8092, proto UDP
vppcom_session_listen:1312: vcl<1164:0>: session 0: sending vpp listen request...
vcl_session_bound_handler:610: vcl<1164:0>: session 0 [0x1]: listen succeeded!
bind:1102: ldp<1164>: fd 32: returning 0

 

vpp# sh app server
Connection                              App                          Wrk
[0:0][CT:U] :::8092->:::0               ldp-1164-app[shm]             0
[#0][U] :::8092->:::0                   ldp-1164-app[shm]             0

 

vpp# sh err
   Count                    Node                  Reason
         7               dpdk-input               no error
      2606             ip6-udp-lookup             no listener for dst port
         8                arp-reply               ARP replies sent
         1              arp-disabled              ARP Disabled on this interface
        13                ip6-glean               neighbor solicitations sent
      2606                ip6-input               valid ip6 packets
         4          ip6-local-hop-by-hop          Unknown protocol ip6 local h-b-h packets dropped
      2606             ip6-icmp-error             destination unreachable response sent
        40             ip6-icmp-input             valid packets
         1             ip6-icmp-input             neighbor solicitations from source not on link
        12             ip6-icmp-input             neighbor solicitations for unknown targets
         1             ip6-icmp-input             neighbor advertisements sent
         1             ip6-icmp-input             neighbor advertisements received
        40             ip6-icmp-input             router advertisements sent
        40             ip6-icmp-input             router advertisements received
         1             ip4-icmp-input             echo replies sent
        89               lldp-input               lldp packets received on disabled interfaces
      1328                llc-input               unknown llc ssap/dsap
vpp#

 

vpp# show trace
------------------- Start of thread 0 vpp_main -------------------
Packet 1

00:23:39:401354: dpdk-input
  HundredGigabitEthernet12/0/0 rx queue 0
  buffer 0x8894e: current data 0, length 1516, buffer-pool 0, ref-count 1, totlen-nifb 0, trace handle 0x0
                  ext-hdr-valid
                  l4-cksum-computed l4-cksum-correct
  PKT MBUF: port 0, nb_segs 1, pkt_len 1516
    buf_len 2176, data_len 1516, ol_flags 0x180, data_off 128, phys_addr 0x75025400
    packet_type 0x2e1 l2_len 0 l3_len 0 outer_l2_len 0 outer_l3_len 0
    rss 0x0 fdir.hi 0x0 fdir.lo 0x0
    Packet Offload Flags
      PKT_RX_IP_CKSUM_GOOD (0x0080) IP cksum of RX pkt. is valid
      PKT_RX_L4_CKSUM_GOOD (0x0100) L4 cksum of RX pkt. is valid
    Packet Types
      RTE_PTYPE_L2_ETHER (0x0001) Ethernet packet
      RTE_PTYPE_L3_IPV6_EXT_UNKNOWN (0x00e0) IPv6 packet with or without extension headers
      RTE_PTYPE_L4_UDP (0x0200) UDP packet
  IP6: b8:83:03:79:9f:e4 -> b8:83:03:79:af:8c 802.1q vlan 2001
  UDP: fd0d:edc4:ffff:2001::201 -> fd0d:edc4:ffff:2001::203
    tos 0x00, flow label 0x0, hop limit 64, payload length 1458
  UDP: 60593 -> 8092
    length 1458, checksum 0x0964
00:23:39:401355: ethernet-input
  frame: flags 0x3, hw-if-index 2, sw-if-index 2
  IP6: b8:83:03:79:9f:e4 -> b8:83:03:79:af:8c 802.1q vlan 2001
00:23:39:401356: ip6-input
  UDP: fd0d:edc4:ffff:2001::201 -> fd0d:edc4:ffff:2001::203
    tos 0x00, flow label 0x0, hop limit 64, payload length 1458
  UDP: 60593 -> 8092
    length 1458, checksum 0x0964
00:23:39:401357: ip6-lookup
  fib 0 dpo-idx 5 flow hash: 0x00000000
  UDP: fd0d:edc4:ffff:2001::201 -> fd0d:edc4:ffff:2001::203
    tos 0x00, flow label 0x0, hop limit 64, payload length 1458
  UDP: 60593 -> 8092
    length 1458, checksum 0x0964
00:23:39:401361: ip6-local
    UDP: fd0d:edc4:ffff:2001::201 -> fd0d:edc4:ffff:2001::203
      tos 0x00, flow label 0x0, hop limit 64, payload length 1458
    UDP: 60593 -> 8092
      length 1458, checksum 0x0964
00:23:39:401362: ip6-udp-lookup
  UDP: src-port 60593 dst-port 8092 (no listener)
00:23:39:401362: ip6-icmp-error
  UDP: fd0d:edc4:ffff:2001::201 -> fd0d:edc4:ffff:2001::203
    tos 0x00, flow label 0x0, hop limit 64, payload length 1458
  UDP: 60593 -> 8092
    length 1458, checksum 0x0964
00:23:39:401363: error-drop
  rx:HundredGigabitEthernet12/0/0.2001
00:23:39:401364: drop
  ip6-input: valid ip6 packets

vpp#

 

 

Thanks,

-Raj

 

 

On Tue, Jan 14, 2020 at 1:44 PM Florin Coras <fcoras.lists@...> wrote:

Hi Raj,

Session layer does support connection-less transports but udp does not raise accept notifications to vcl. UDPC might, but we haven’t tested udpc with vcl in a long time so it might not work properly. 

What was the problem you were hitting in the non-connected case?

Regards,
Florin

> On Jan 14, 2020, at 7:13 AM, raj.gautam25@... wrote:

> Hi ,
> I am trying some host application tests ( using LD_PRELOAD) .  TCP rx and tx both work fine. UDP tx also works fine. 
> The issue is only with UDP rx .  In some discussion it was mentioned that session layer does not support connection-less transports so protocols like udp still need to accept connections and only afterwards read from the fifos.
> So, I changed the UDP receiver application to use listen() and accept() before read() . But , I am still having issue to make it run. 
> After I started, udp traffic from other server it seems to accept the connection but never returns from the vppcom_session_accept() function.
> VPP release is 19.08.

> vpp# sh app server
> Connection                              App                          Wrk
> [0:0][CT:U] 0.0.0.0:8090->0.0.0.0:0     ldp-36646-app[shm]            0
> [#0][U] 0.0.0.0:8090->0.0.0.0:0         ldp-36646-app[shm]            0
> vpp#
>  
>  
> [root@orc01 testcode]#  VCL_DEBUG=2 LDP_DEBUG=2 LD_PRELOAD=/opt/vpp/build-root/install-vpp-native/vpp/lib/libvcl_ldpreload.so  VCL_CONFIG=/etc/vpp/vcl.cfg ./udp_rx
> VCL<36646>: configured VCL debug level (2) from VCL_DEBUG!
> VCL<36646>: allocated VCL heap = 0x7f77e5309010, size 268435456 (0x10000000)
> VCL<36646>: configured rx_fifo_size 4000000 (0x3d0900)
> VCL<36646>: configured tx_fifo_size 4000000 (0x3d0900)
> VCL<36646>: configured app_scope_local (1)
> VCL<36646>: configured app_scope_global (1)
> VCL<36646>: configured api-socket-name (/tmp/vpp-api.sock)
> VCL<36646>: completed parsing vppcom config!
> vppcom_connect_to_vpp:549: vcl<36646:0>: app (ldp-36646-app) is connected to VPP!
> vppcom_app_create:1067: vcl<36646:0>: sending session enable
> vppcom_app_create:1075: vcl<36646:0>: sending app attach
> vppcom_app_create:1084: vcl<36646:0>: app_name 'ldp-36646-app', my_client_index 0 (0x0)
> ldp_init:209: ldp<36646>: configured LDP debug level (2) from env var LDP_DEBUG!
> ldp_init:282: ldp<36646>: LDP initialization: done!
> ldp_constructor:2490: LDP<36646>: LDP constructor: done!
> socket:974: ldp<36646>: calling vls_create: proto 1 (UDP), is_nonblocking 0
> vppcom_session_create:1142: vcl<36646:0>: created session 0
> Socket successfully created..
> bind:1086: ldp<36646>: fd 32: calling vls_bind: vlsh 0, addr 0x7fff3f3c1040, len 16
> vppcom_session_bind:1280: vcl<36646:0>: session 0 handle 0: binding to local IPv4 address 0.0.0.0 port 8090, proto UDP
> vppcom_session_listen:1312: vcl<36646:0>: session 0: sending vpp listen request...
> vcl_session_bound_handler:610: vcl<36646:0>: session 0 [0x1]: listen succeeded!
> bind:1102: ldp<36646>: fd 32: returning 0
> Socket successfully binded..
> listen:2005: ldp<36646>: fd 32: calling vls_listen: vlsh 0, n 5
> vppcom_session_listen:1308: vcl<36646:0>: session 0 [0x1]: already in listen state!
> listen:2020: ldp<36646>: fd 32: returning 0
> Server listening..
> ldp_accept4:2043: ldp<36646>: listen fd 32: calling vppcom_session_accept: listen sid 0, ep 0x0, flags 0x3f3c0fc0
> vppcom_session_accept:1478: vcl<36646:0>: discarded event: 0
>  

 

 

 

 

 







Re: #vpp-hoststack - Issue with UDP receiver application using VCL library #vpp-hoststack

Raj Kumar
 

Hi Florin,
Thanks for your explanation on the Tx buffer. Now , I have a better understanding how it works.
In my application, I have to encapsulate some type of Layer2 frame inside IPv6/UDP packet. I don't want VPP session layer to slice my frame into smaller packets, so I changed the UDP MSS/MTU to 9K in the VPP code. 
That's why I am using 9K size tx buffer.

As suggested by you , I made the code changes to move the half open connection to a different thread.  Now the half open udp connections are going to the different threads in a round robin manner. 
I added following block of code at the end of the function session_stream_connect_notify_inline(). 
  
if(s->session_state  == SESSION_STATE_OPENED && tc->proto == TRANSPORT_PROTO_UDP)
{
    uc = udp_get_connection_from_transport (tc);
    if(uc->flags & UDP_CONN_F_CONNECTED)
    {
      if(s->thread_index != new_thread_index)
      {
        new_uc = udp_connection_clone_safe2(s->connection_index, s->thread_index, new_thread_index);
        session_dgram_connect_notify2(&new_uc->connection, s->thread_index, &s, new_thread_index);
      }
      if(vlib_num_workers())
        new_thread_index = new_thread_index % vlib_num_workers () + 1;
    }
}
With these changes , I am able to achieve ~50Gbps tx only throughput with my test application.  In the test setup, VPP is using 4 worker threads for transmission and test application is using 8K size buffer. There is one UDP session running on each worker.  There is no rx/tx drops on the interface while running this test.
Definitely, performance is improved after moving connections on different threads. Earlier I was getting ~22Gbps tx only throughput with 8K buffer. 

vpp# sh  session verbose
Thread 0: no sessions
Connection                                        State          Rx-f      Tx-f
[#1][U] fd0d:edc4:ffff:2002::213:27025->fd0d:edc4:-              0         3999999
Thread 1: active sessions 1
Connection                                        State          Rx-f      Tx-f
[#2][U] fd0d:edc4:ffff:2002::213:5564->fd0d:edc4:f-              0         3999999
Thread 2: active sessions 1
Connection                                        State          Rx-f      Tx-f
[#3][U] fd0d:edc4:ffff:2002::213:62955->fd0d:edc4:-              0         3999999
Thread 3: active sessions 1
Connection                                        State          Rx-f      Tx-f
[#4][U] fd0d:edc4:ffff:2002::213:4686->fd0d:edc4:f-              0         3999999
Thread 4: active sessions 1


There is one more point for the discussion. In my test application I am using  vppcom_select() to write the buffer. It works fine if application is single threaded .
But, when using vppcom_select()  inside  a thread ( in a multi-threaded application) , the application is crashing at FD_SET Macro.  

The reason, I found that vppcom is returning a large value (16777216)  as a session handle ( return value of vppcom_session_create()).
It uses the following logic to get the session handle - 
session handle = vcl_get_worker_index () << 24 | session_index

After, I changed this logic to 
session handle = vcl_get_worker_index () << 4 | session_index
 It is returning session handle 16 and there is no crash at FD_SET. 

But, still  vppcom_select() is getting timedout and returning 0.
Btw, in my application I am calling vppcom_app_create() from main and then vppcom_worker_register() from inside the threads.

Here is the piece of code that works fine in a single threaded application but have issue with multi threaded application.
FD_ZERO(&writefds);
FD_SET(sockfd, &writefds);
 rv = vppcom_select (sockfd+1, NULL,  (unsigned long *) &writefds, NULL, 10);
                     
Thanks,
-Raj


 





On Thu, Feb 6, 2020 at 6:16 PM Florin Coras <fcoras.lists@...> wrote:
Hi Raj,

Inline.

On Feb 6, 2020, at 2:07 PM, Raj Kumar <raj.gautam25@...> wrote:

 Hi Florin,
 For the UDP tx connection (half duplex ,originating from the host application) , I understood that until the receiver  send some "app level ack" it can not be assigned to a different worker thread. 
 Here the issue is that I can not change  the receiver application ( not owned by me).
 

FC: Okay. 

 In the VPP code I found that the UDP tx connections are being assigned to the first worker ( worker_0) in a hard coded manner. 
 thread_index = vlib_num_workers ()? 1 : 0;

FC: The session queue node does not run on main thread (unless programmed by some other infra). So, we ensure sessions are assigned to a valid vpp worker that reads evt msg from the app/vcl. 


Just curious to know, if a UDP half duplex connection can be assigned to the other worker thread.( other than worker_0). The best case would be if we can equally assign UDP connections to all the worker threads.
Please let me know if this approach can work or not. 

FC: The half-open api doesn’t allow the passing of a thread index, so session_open_vc (for udpc) would fail when calling transport_get_half_open if you change the thread. Having said that, to check if performance could be improved, you could maybe try to move the half-open udp connection to a different thread [1]. As a POC, since the opens are done with the worker barrier held, on the transport_half_open_has_fifos () branch, you could try to replace the half-open with a new udp connection, allocated on a different worker. 

[1] That’s what we do in udp46_input_inline for SESSION_STATE_OPENED and UDP_CONN_F_CONNECTED. But note that the apis assume the current thread (vlib_get_thread_index()) is the destination thread. 

  
Btw, now I am using VCL library in my real application. I am able to achieve 22 Gbps rx and tx UDP traffic on each NUMA node, a total of 44 Gbps rx and tx throughput on server. I am running 4 applications on each of  the NUMA node. The tx buffer size is 9K ( jumbo frame).  I did not understand how we can use large tx buffer ( > 9K) , can we add vector of packets in the send buffer? But, then how the packet boundary would be maintained.  Actually, I am not clear on this part. 

FC: I suspect there’s some misunderstanding here. There’s two things to consider:
- First, there’s a buffer (we call it a fifo) that vcl write to and the session layer reads from when it builds packets that are pushed over the network. VCL's writes to the fifo can be as big as the fifo (and tx fifo is configurable in vcl.conf).
- Second, the session layer builds packets that are mss/mtu sized. For udp, that mtu is currently not configurable and fixed at 1460B. So vcl's 9kB (or larger) writes to the fifo are cut into smaller packets. For tcp, mtu is configurable via a startup.conf stanza and if it’s larger than what our vlib buffers (default 2kB) then the session layer generates a chain of buffers. That is, a jumbo frame with only 1 udp and 1 ip header.  
 

 In VPP, I am using 3 worker threads and 2 rx queues. The rx queues are assigned to worker_1 and worker_2 so that worker_0 can be fully utilized for only transmission.

FC: It looks pretty good, but I guess you’d like to reach the 30Gbps rx/tx per numa you set out to achieve. Maybe try the hack above and see if it helps. 

By the way, how bad are the interfaces' rx/tx drops?

Regards,
Florin

 
vpp# sh int rx-placement
Thread 2 (vpp_wk_1):
  node dpdk-input:
    eth12 queue 0 (polling)
Thread 3 (vpp_wk_2):
  node dpdk-input:
    eth12 queue 1 (polling)

As expected, Tx connections are going on the same thread (worker_0).

vpp# sh session verbose 1
Connection                                        State          Rx-f      Tx-f
[#0][U] fd0d:edc4:ffff:2001::203:9915->:::0       -              0         0
[#0][U] fd0d:edc4:ffff:2001::223:9915->:::0       -              0         0
[#0][U] fd0d:edc4:ffff:2001::233:9915->:::0       -              0         0
[#0][U] fd0d:edc4:ffff:2001::243:9915->:::0       -              0         0
Thread 0: active sessions 4

Connection                                        State          Rx-f      Tx-f
[#1][U] fd0d:edc4:ffff:2002::213:20145->fd0d:edc4:-              0         30565
[#1][U] fd0d:edc4:ffff:2002::213:30812->fd0d:edc4:-              0         30565
[#1][U] fd0d:edc4:ffff:2002::213:47115->fd0d:edc4:-              0         67243
[#1][U] fd0d:edc4:ffff:2002::213:44526->fd0d:edc4:-              0         73356
Thread 1: active sessions 4

Connection                                        State          Rx-f      Tx-f
[#2][U] fd0d:edc4:ffff:2001::233:9915->fd0d:edc4:f-              0         0
[#2][U] fd0d:edc4:ffff:2001::243:9915->fd0d:edc4:f-              0         0
Thread 2: active sessions 2

Connection                                        State          Rx-f      Tx-f
[#3][U] fd0d:edc4:ffff:2001::203:9915->fd0d:edc4:f-              0         0
[#3][U] fd0d:edc4:ffff:2001::223:9915->fd0d:edc4:f-              0         0
Thread 3: active sessions 2
vpp#

thanks,
-Raj

   

On Tue, Jan 28, 2020 at 10:22 AM Raj Kumar via Lists.Fd.Io <raj.gautam25=gmail.com@...> wrote:
Hi Florin,
Sorry for the confusion. Accidentally, I was using blocking udp socket. With non-blocking socket, I do not see this issue. 

thanks,
-Raj

On Mon, Jan 27, 2020 at 11:57 PM Florin Coras <fcoras.lists@...> wrote:
Hi Raj, 

If that’s the case, read should return VPPCOM_EWOULDBLOCK if there’s nothing left to read. Where exactly does it block?

Regards,
Florin

On Jan 27, 2020, at 7:21 PM, Raj Kumar <raj.gautam25@...> wrote:

Hi Florin,
OK, I will try by adding some "app level ack" before starting  the traffic.
For the closing issue; Yes, I am using non blocking session in combination of epoll.  

thanks, 
-Raj

On Mon, Jan 27, 2020 at 9:35 PM Florin Coras <fcoras.lists@...> wrote:
Hi Raj, 

Ow, the sessions are not getting any return traffic so they cannot be migrated to their rss selected thread. So unless you send some type of “app level ack” from receiver to sender, before starting the transfer, results will probably not change wrt to vanilla udp. 

As for the closing issue, are you using non-blocking sessions in combination with epoll?

Regards,
Florin

On Jan 27, 2020, at 4:23 PM, Raj Kumar <raj.gautam25@...> wrote:

Hi Florin,
I tried udpc and now the rx connections(half duplex) are pinned to the vpp workers. However , all tx connections(half duplex) still go on the same thread.

vpp# sh session verbose 1
Connection                                        State          Rx-f      Tx-f
[#0][U] fd0d:edc4:ffff:2001::203:5555->:::0       -              0         0
[#0][U] fd0d:edc4:ffff:2001::203:6666->:::0       -              0         0
[#0][U] fd0d:edc4:ffff:2001::203:7777->:::0       -              0         0
Thread 0: active sessions 3

Connection                                        State          Rx-f      Tx-f
[#1][U] fd0d:edc4:ffff:2001::203:3261->fd0d:edc4:f-              0         3999999
Thread 1: active sessions 1
Thread 2: no sessions

Connection                                        State          Rx-f      Tx-f
[#3][U] fd0d:edc4:ffff:2001::203:5555->fd0d:edc4:f-              0         0
[#3][U] fd0d:edc4:ffff:2001::203:7777->fd0d:edc4:f-              0         0
Thread 3: active sessions 2

Connection                                        State          Rx-f      Tx-f
[#4][U] fd0d:edc4:ffff:2001::203:6666->fd0d:edc4:f-              0         0
Thread 4: active sessions 1

One issue I observed is that on stopping the traffic (from sender), the UDP rx application is blocking on the vppcom_session_read() API.

thanks,
-Raj

 

On Mon, Jan 27, 2020 at 4:02 PM Balaji Venkatraman (balajiv) <balajiv@...> wrote:

No intent to distract the context of the thread. But just the observation:

 

Vector rate is not high during running these tests.

 

 

Is interesting.

 

Always thought the drops at Rx/Tx had a direct correlation to the vector rate.., I see, it need not be true.

 

Thanks.

--

Regards,

Balaji. 

 

 

From: <vpp-dev@...> on behalf of Raj Kumar <raj.gautam25@...>
Date: Sunday, January 26, 2020 at 12:19 PM
To: Florin Coras <fcoras.lists@...>
Cc: vpp-dev <vpp-dev@...>
Subject: Re: [vpp-dev] #vpp-hoststack - Issue with UDP receiver application using VCL library

 

Hi Florin,

I was making some basic mistake. The UDP application was running on a different NUMA node . After I pinned my application to the same NUMA , the UDP tx throughput increased to 39 Gbps. With 2 application ( 2 UDP tx) , I am able to achieve 50 Gbps tx throughput. 

However, there are tx errors. The rx/tx descriptor is set to 2048, 2048 .

After changing the tx descriptor to 16,384 the tx errors stopped but tx throughput still stays at 50-51 Gbps. Vector rate is not high during running these tests.

 

But,  If I start rx and tx simultaneously then the total throughput is ~ 55 Gbps  ( 25 rx and 30 tx) and there are lot of  rx and tx drops on the interface. 

 

Now about udpc; my application has to receive/send UDP traffic from/to the other applications which are running on another servers on top of Linux kernel ( not using VPP). So, I can not use udpc.

 

All the tests I have executed so far , are between 2 servers. On one server VPP is installed and  applications are using VCL whereas peer applications are hosted on the Linux kernel on the another server.

 

 

With 1 connection (39 Gbps ) :- 

 

Thread 1 vpp_wk_0 (lcore 2)
Time 448.7, 10 sec internal node vector rate 3.97
  vector rates in 7.9647e5, out 7.9644e5, drop 2.4843e1, punt 0.0000e0
             Name                 State         Calls          Vectors        Suspends         Clocks       Vectors/Call
dpdk-input                       polling         298760628             251               0          1.49e8            0.00
drop                             active               2963           11147               0          1.12e2            3.76
error-drop                       active               2963           11147               0          6.92e1            3.76
eth12-output                     active           90055395       357362649               0          5.04e1            3.97
eth12-tx                         active           90055395       357362649               0          3.16e2            3.97
ethernet-input                   active                251             251               0          1.68e3            1.00
icmp6-neighbor-solicitation      active                 15              15               0          1.28e4            1.00
interface-output                 active                 15              15               0          3.16e3            1.00
ip6-drop                         active               2727           10911               0          3.46e1            4.00
ip6-glean                        active               2727           10911               0          6.87e1            4.00
ip6-icmp-input                   active                 15              15               0          1.43e3            1.00
ip6-input                        active                 15              15               0          2.03e3            1.00
ip6-local                        active                 15              15               0          2.17e3            1.00
ip6-lookup                       active           90058112       357373550               0          8.77e1            3.97
ip6-rewrite                      active           90055370       357362624               0          5.84e1            3.97
ip6-rewrite-mcast                active                 10              10               0          6.18e2            1.00
llc-input                        active                221             221               0          1.26e3            1.00
lldp-input                       active                 15              15               0          4.25e3            1.00
session-queue                    polling          90185400       357373535               0          2.21e3            3.96
unix-epoll-input                 polling            291483               0               0          1.45e3            0.00

 

With 2 connections ( 50 Gbps)

 

Thread 1 vpp_wk_0 (lcore 2)
Time 601.7, 10 sec internal node vector rate 7.94
  vector rates in 7.8457e5, out 7.8455e5, drop 2.1560e1, punt 0.0000e0
             Name                 State         Calls          Vectors        Suspends         Clocks       Vectors/Call
dpdk-input                       polling         250185974             338               0          1.04e8            0.00
drop                             active               3481           12973               0          1.16e2            3.73
error-drop                       active               3481           12973               0          6.96e1            3.73
eth12-output                     active           76700880       472068455               0          4.13e1            6.15
eth12-tx                         active           76700880       472068455               0          3.25e2            6.15
ethernet-input                   active                338             338               0          1.51e3            1.00
icmp6-neighbor-solicitation      active                 20              20               0          1.47e4            1.00
interface-output                 active                 20              20               0          2.58e3            1.00
ip6-drop                         active               3163           12655               0          5.35e1            4.00
ip6-glean                        active               3163           12655               0          8.08e1            4.00
ip6-icmp-input                   active                 20              20               0          1.32e3            1.00
ip6-input                        active                 20              20               0          1.45e3            1.00
ip6-local                        active                 20              20               0          2.08e3            1.00
ip6-lookup                       active           76704032       472081099               0          7.37e1            6.15
ip6-rewrite                      active           76700849       472068424               0          4.46e1            6.15
ip6-rewrite-mcast                active                 11              11               0          6.56e2            1.00
llc-input                        active                298             298               0          1.15e3            1.00
lldp-input                       active                 20              20               0          3.71e3            1.00
session-queue                    polling          76960608       472081079               0          2.33e3            6.13
unix-epoll-input                 polling            244093               0               0          1.45e3            0.00

 

 

thanks,

-Raj

 

On Fri, Jan 24, 2020 at 9:05 PM Florin Coras <fcoras.lists@...> wrote:

Hi Raj, 

 

Inline.



On Jan 24, 2020, at 4:41 PM, Raj Kumar <raj.gautam25@...> wrote:

 

Hi Florin,

After fixing the UDP checksum offload issue and using the 64K tx buffer, I am able to send 35Gbps ( half duplex) . 

In DPDK code ( ./plugins/dpdk/device/init.c) , it was not setting the DEV_TX_OFFLOAD_TCP_CKSUM and DEV_TX_OFFLOAD_UDP_CKSUM offload bit for MLNX5 PMD.

 In the udp tx application I am using vppcom_session_write to write to the session and write len is same as the buffer size ( 64K).  

 

 

FC: Okay. 



Btw, I run all the tests with the patch https://gerrit.fd.io/r/c/vpp/+/24462 you provided.

 

If I run a single UDP tx connection then the throughput is 35 Gbps.

 

FC: Since your nics are 100Gbps, I’m curious why you’re not seeing more. What sort of vector rates do you see with “show run”, to be specific, are they higher than 40? If you increase writes to 128kB do you see throughput increase. I know that’s no ideal, more lower. 



But, on starting other UDP rx connections (20 Gbps) the tx throughput goes down to 12Gbps.

Even , if I run 2 UDP tx connection then also I am not able to scale up the throughput.  The overall throughput stays the same.

First I tried this test with 4 worker threads and then with 1 worker thread. 

 

FC: Any rx/tx errors on the interfaces? Because udp is not flow controlled you may receive/send bursts of packets that are larger than your rx/tx descriptor rings. How busy are your nodes in show run?

 

If you’re seeing a lot of rx/tx errors, you might try to increase the number of rx/tx descriptors per interface queue. 



 

 I have following 2 points -

1) With my udp tx test application, I am getting this throughput after using 64K tx buffer. But , in actual product I have to send the variable size UDP packets ( max len 9000 bytes) . That mean the maximum tx buffer size would be 9K

 

FC: This dgram limitation in the session layer can be optimized. That is, we could could read beyond the first dgram to see how much data is really available. Previously for this sort of scenarios we used udp in stream mode, but we don’t support that now. We might in the future, but that’s a longer story ...



and with that buffer size  I am getting 15Gbps which is fine if I can some how scale up it by running multiple applications. But, that does not seems to work with UDP ( I am not using udpc). 

 

FC: With udp you might not be able to scale throughput because of the first worker pinning of connections. So you might want to try udpc. 

 

Also, whenever the session layer schedules a connection for sending, it only allows that connection to burst a maximum of 42 packets. As you add more connections, the number of packets/dispatch grows to a max of 256 (vnet frame size). As the burst/dispatch grows, you’re putting more stress on the nic’s tx queue, so the chance of tx drops increases. 



 

2)  My target is the achieve at least 30 Gbps rx and 30 Gbps tx UDP throughput on one  NUMA node.  I tried by running the multiple VPP instances on VFs ( SR-IOV) and I can scale up the throughput ( rx and tx)  with the number of VPP instances. 

Here is the throughput test with VF - 

1 VPP instance  ( 15Gbps  rx and 15Gbps  tx)

2 VPP instances  ( 30Gbps rx and 30 Ghps tx)  

3 VPP instances  ( 45 Gbps rx and 35Gbps tx) 

 

FC: This starts to be complicated because of what I mentioned above and, if you use udpc, because of how rss distributes those flows over workers. 

 

Still, with udpc, if “you’re lucky” you might get the rx and tx flows on different workers, so you might be able to get 30Gbps on each per numa. You could then start 2 vpp instances, each with workers pinned to a numa. 

 

Did you make sure that your apps run on the right numa with something like taskset? It matters becase data is copied into and out of the fifos shared by vcl and vpp workers. Also, currently the segment manager (part of session layer) is not numa aware, so you may want to avoid running vpp with workers spread over multiple numas. 



 

I have 2 NUMA node on the serer so I am expecting to get 60 Gbps rx and 60 Gbps rx total throughput.

 

Btw, I also tested TCP without VF. It seems to scale up properly as the connections are going on different threads.

 

FC: You should get the same type of connection distribution with udpc. Note that it’s not necessarily ideal, for instance, in the last scenario the distribution is 3, 4, 2, 3. 

 

The main differences with tcp are 1) lack of flow control and 2) dgram mode (that leads to the short write throughput limitation). If you can, give udpc a try. 

 

By the way, 18Gbps for one connection seems rather low, if this is measured between 2 iperfs running through vpp’s host stack. Even with 1.5k mtu you should get much more. Also, if you’re looking at maximizing tcp throughput, you might want to check tso as well.

 

Regards,

Florin


 

vpp# sh thread

ID     Name                Type        LWP     Sched Policy (Priority)  lcore  Core   Socket State

0      vpp_main                        22181   other (0)                1      0      0

1      vpp_wk_0            workers     22183   other (0)                2      2      0

2      vpp_wk_1            workers     22184   other (0)                3      3      0

3      vpp_wk_2            workers     22185   other (0)                4      4      0

4      vpp_wk_3            workers     22186   other (0)                5      8      0

 

4 worker threads

Iperf3 TCP tests  - 8000 bytes packets

1 Connection:

Rx only

18 Gbps

vpp# sh session verbose 1

Connection                                        State          Rx-f      Tx-f

[0:0][T] fd0d:edc4:ffff:2001::203:6669->:::0      LISTEN         0         0

Thread 0: active sessions 1

 

Connection                                        State          Rx-f      Tx-f

[1:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

Thread 1: active sessions 1

Thread 2: no sessions

Thread 3: no sessions

 

Connection                                        State          Rx-f      Tx-f

[4:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

Thread 4: active sessions 1

 

2 connections:

 

Rx only

32Gbps

vpp# sh session verbose 1

Connection                                        State          Rx-f      Tx-f

[0:0][T] fd0d:edc4:ffff:2001::203:6669->:::0      LISTEN         0         0

[0:1][T] fd0d:edc4:ffff:2001::203:6679->:::0      LISTEN         0         0

Thread 0: active sessions 2

 

Connection                                        State          Rx-f      Tx-f

[1:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

Thread 1: active sessions 1

Thread 2: no sessions

Thread 3: no sessions

 

Connection                                        State          Rx-f      Tx-f

[4:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

[4:1][T] fd0d:edc4:ffff:2001::203:6679->fd0d:edc4:ESTABLISHED    0         0

[4:2][T] fd0d:edc4:ffff:2001::203:6679->fd0d:edc4:ESTABLISHED    0         0

Thread 4: active sessions 3

3 connection

Rx only

43Gbps

vpp# sh session verbose 1

Connection                                        State          Rx-f      Tx-f

[0:0][T] fd0d:edc4:ffff:2001::203:6669->:::0      LISTEN         0         0

[0:1][T] fd0d:edc4:ffff:2001::203:6679->:::0      LISTEN         0         0

[0:2][T] fd0d:edc4:ffff:2001::203:6689->:::0      LISTEN         0         0

Thread 0: active sessions 3

 

Connection                                        State          Rx-f      Tx-f

[1:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

Thread 1: active sessions 1

Thread 2: no sessions

 

Connection                                        State          Rx-f      Tx-f

[3:0][T] fd0d:edc4:ffff:2001::203:6689->fd0d:edc4:ESTABLISHED    0         0

Thread 3: active sessions 1

 

Connection                                        State          Rx-f      Tx-f

[4:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

[4:1][T] fd0d:edc4:ffff:2001::203:6679->fd0d:edc4:ESTABLISHED    0         0

[4:2][T] fd0d:edc4:ffff:2001::203:6679->fd0d:edc4:ESTABLISHED    0         0

[4:3][T] fd0d:edc4:ffff:2001::203:6689->fd0d:edc4:ESTABLISHED    0         0

Thread 4: active sessions 4

2 connection

1 -Rx

1 -Tx

Rx – 18Gbps

Tx – 12 Gbps

vpp# sh session verbose 1

Connection                                        State          Rx-f      Tx-f

[0:0][T] fd0d:edc4:ffff:2001::203:6669->:::0      LISTEN         0         0

Thread 0: active sessions 1

 

Connection                                        State          Rx-f      Tx-f

[1:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

Thread 1: active sessions 1

Thread 2: no sessions

 

Connection                                        State          Rx-f      Tx-f

[3:0][T] fd0d:edc4:ffff:2001::203:10376->fd0d:edc4ESTABLISHED    0         0

[3:1][T] fd0d:edc4:ffff:2001::203:12871->fd0d:edc4ESTABLISHED    0         3999999

Thread 3: active sessions 2

 

Connection                                        State          Rx-f      Tx-f

[4:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

Thread 4: active sessions 1

4 connections

2 – Rx

2-Tx

Rx – 27 Gbps

Tx – 24 Gbps

vpp# sh session verbose 1

Connection                                        State          Rx-f      Tx-f

[0:0][T] fd0d:edc4:ffff:2001::203:6669->:::0      LISTEN         0         0

[0:1][T] fd0d:edc4:ffff:2001::203:6689->:::0      LISTEN         0         0

Thread 0: active sessions 2

 

Connection                                        State          Rx-f      Tx-f

[1:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

Thread 1: active sessions 1

 

Connection                                        State          Rx-f      Tx-f

[2:0][T] fd0d:edc4:ffff:2001::203:51962->fd0d:edc4ESTABLISHED    0         0

[2:1][T] fd0d:edc4:ffff:2001::203:56849->fd0d:edc4ESTABLISHED    0         3999999

[2:2][T] fd0d:edc4:ffff:2001::203:6689->fd0d:edc4:ESTABLISHED    0         0

Thread 2: active sessions 3

 

Connection                                        State          Rx-f      Tx-f

[3:1][T] fd0d:edc4:ffff:2001::203:6689->fd0d:edc4:ESTABLISHED    0         0

Thread 3: active sessions 1

 

Connection                                        State          Rx-f      Tx-f

[4:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

[4:1][T] fd0d:edc4:ffff:2001::203:57550->fd0d:edc4ESTABLISHED    0         3999999

[4:2][T] fd0d:edc4:ffff:2001::203:56939->fd0d:edc4ESTABLISHED    0         0

Thread 4: active sessions 3

5 connections

2 – Rx

3 -Tx

Rx – 27 Gbps

Tx – 28 Gbps

vpp# sh session verbose 1

Connection                                        State          Rx-f      Tx-f

[0:0][T] fd0d:edc4:ffff:2001::203:6669->:::0      LISTEN         0         0

[0:1][T] fd0d:edc4:ffff:2001::203:6689->:::0      LISTEN         0         0

[0:2][T] fd0d:edc4:ffff:2001::203:7729->:::0      LISTEN         0         0

Thread 0: active sessions 3

 

Connection                                        State          Rx-f      Tx-f

[1:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

[1:2][T] fd0d:edc4:ffff:2001::203:39216->fd0d:edc4ESTABLISHED    0         3999999

Thread 1: active sessions 2

 

Connection                                        State          Rx-f      Tx-f

[2:0][T] fd0d:edc4:ffff:2001::203:51962->fd0d:edc4ESTABLISHED    0         0

[2:1][T] fd0d:edc4:ffff:2001::203:56849->fd0d:edc4ESTABLISHED    0         3999999

[2:2][T] fd0d:edc4:ffff:2001::203:6689->fd0d:edc4:ESTABLISHED    0         0

Thread 2: active sessions 3

 

Connection                                        State          Rx-f      Tx-f

[3:0][T] fd0d:edc4:ffff:2001::203:29141->fd0d:edc4ESTABLISHED    0         0

[3:1][T] fd0d:edc4:ffff:2001::203:6689->fd0d:edc4:ESTABLISHED    0         0

Thread 3: active sessions 2

 

Connection                                        State          Rx-f      Tx-f

[4:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

[4:1][T] fd0d:edc4:ffff:2001::203:57550->fd0d:edc4ESTABLISHED    0         3999999

[4:2][T] fd0d:edc4:ffff:2001::203:56939->fd0d:edc4ESTABLISHED    0         0

Thread 4: active sessions 3

6 connection

3 – Rx

3 – Tx

Rx – 41 Gbps

Tx – 13 Gbps

vpp# sh session verbose 1

Connection                                        State          Rx-f      Tx-f

[0:0][T] fd0d:edc4:ffff:2001::203:6669->:::0      LISTEN         0         0

[0:1][T] fd0d:edc4:ffff:2001::203:6689->:::0      LISTEN         0         0

[0:2][T] fd0d:edc4:ffff:2001::203:7729->:::0      LISTEN         0         0

Thread 0: active sessions 3

 

Connection                                        State          Rx-f      Tx-f

[1:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

[1:1][T] fd0d:edc4:ffff:2001::203:7729->fd0d:edc4:ESTABLISHED    0         0

[1:2][T] fd0d:edc4:ffff:2001::203:39216->fd0d:edc4ESTABLISHED    0         3999999

Thread 1: active sessions 3

 

Connection                                        State          Rx-f      Tx-f

[2:0][T] fd0d:edc4:ffff:2001::203:51962->fd0d:edc4ESTABLISHED    0         0

[2:1][T] fd0d:edc4:ffff:2001::203:56849->fd0d:edc4ESTABLISHED    0         3999999

[2:2][T] fd0d:edc4:ffff:2001::203:6689->fd0d:edc4:ESTABLISHED    0         0

[2:3][T] fd0d:edc4:ffff:2001::203:7729->fd0d:edc4:ESTABLISHED    0         0

Thread 2: active sessions 4

 

Connection                                        State          Rx-f      Tx-f

[3:0][T] fd0d:edc4:ffff:2001::203:29141->fd0d:edc4ESTABLISHED    0         0

[3:1][T] fd0d:edc4:ffff:2001::203:6689->fd0d:edc4:ESTABLISHED    0         0

Thread 3: active sessions 2

 

Connection                                        State          Rx-f      Tx-f

[4:0][T] fd0d:edc4:ffff:2001::203:6669->fd0d:edc4:ESTABLISHED    0         0

[4:1][T] fd0d:edc4:ffff:2001::203:57550->fd0d:edc4ESTABLISHED    0         3999999

[4:2][T] fd0d:edc4:ffff:2001::203:56939->fd0d:edc4ESTABLISHED    0         0

Thread 4: active sessions 3

 

 

thanks,

-Raj

 

    

 

 

 

 

 

 

 

 

 

On Tue, Jan 21, 2020 at 9:43 PM Florin Coras <fcoras.lists@...> wrote:

Hi Raj, 

 

Inline.



On Jan 21, 2020, at 3:41 PM, Raj Kumar <raj.gautam25@...> wrote:

 

Hi Florin,

There is no drop on the interfaces. It is 100G card. 

In UDP tx application, I am using 1460 bytes of buffer to send on select(). I am getting 5 Gbps throughput  ,but if I start one more application then total throughput goes down to 4 Gbps as both the sessions are on the same thread.   

I increased the tx buffer to 8192 bytes and then I can get 11 Gbps throughput  but again if I start one more application the throughput goes down to 10 Gbps.

 

FC: I assume you’re using vppcom_session_write to write to the session. How large is “len” typically? See lower on why that matters.

 

 

I found one issue in the code ( You must be aware of that) , the UDP send MSS is hard-coded to 1460 ( /vpp/src/vnet/udp/udp.c file). So, the large packets  are getting fragmented. 

udp_send_mss (transport_connection_t * t)
{
  /* TODO figure out MTU of output interface */
  return 1460;
}

 

FC: That’s a typical mss and actually what tcp uses as well. Given the nics, they should be fine sending a decent number of mpps without the need to do jumbo ip datagrams. 



if I change the MSS to 8192 then I am getting 17 Mbps throughput. But , if i start one more application then throughput is going down to 13 Mbps. 

 

It looks like the 17 Mbps is per core limit and since all the sessions are pined to the same thread we can not get more throughput.  Here, per core throughput look good to me. Please let me know there is any way to use multiple threads for UDP tx applications. 

 

In your previous email you mentioned that we can use connected udp socket in the UDP receiver. Can we do something similar for UDP tx ?

 

FC: I think it may work fine if vpp has main + 1 worker. I have a draft patch here [1] that seems to work with multiple workers but it’s not heavily tested. 

 

Out of curiosity, I ran a vcl_test_client/server test with 1 worker and with XL710s, I’m seeing this:

 

CLIENT RESULTS: Streamed 65536017791 bytes
  in 14.392678 seconds (36.427420 Gbps half-duplex)!

 

Should be noted that because of how datagrams are handled in the session layer, throughput is sensitive to write sizes. I ran the client like:

~/vcl_client -p udpc 6.0.1.2 1234 -U -N 1000000 -T 65536

 

Or in english, unidirectional test, tx buffer of 64kB and 1M writes of that buffer. My vcl config was such that tx fifos were 4MB and rx fifos 2MB. The sender had few tx packet drops (1657) and the receiver few rx packet drops (801). If you plan to use it, make sure arp entries are first resolved (e.g., use ping) otherwise the first packet is lost. 

 

Throughput drops to ~15Gbps with 8kB writes. You should probably also test with bigger writes with udp. 

 



 

From the hardware stats , it seems that UDP tx checksum offload is not enabled/active  which could impact the performance. I think, udp tx checksum should be enabled by default if it is not disabled using parameter  "no-tx-checksum-offload".

 

FC: Performance might be affected by the limited number of offloads available. Here’s what I see on my XL710s:

 

rx offload active: ipv4-cksum jumbo-frame scatter

tx offload active: udp-cksum tcp-cksum multi-segs

 

 

Ethernet address b8:83:03:79:af:8c
  Mellanox ConnectX-4 Family
    carrier up full duplex mtu 9206
    flags: admin-up pmd maybe-multiseg subif rx-ip4-cksum
    rx: queues 5 (max 65535), desc 1024 (min 0 max 65535 align 1)

 

FC: Are you running with 5 vpp workers? 

 

Regards,

Florin



    tx: queues 6 (max 65535), desc 1024 (min 0 max 65535 align 1)
    pci: device 15b3:1017 subsystem 1590:0246 address 0000:12:00.00 numa 0
    max rx packet len: 65536
    promiscuous: unicast off all-multicast on
    vlan offload: strip off filter off qinq off
    rx offload avail:  vlan-strip ipv4-cksum udp-cksum tcp-cksum vlan-filter
                       jumbo-frame scatter timestamp keep-crc
    rx offload active: ipv4-cksum jumbo-frame scatter
    tx offload avail:  vlan-insert ipv4-cksum udp-cksum tcp-cksum tcp-tso
                       outer-ipv4-cksum vxlan-tnl-tso gre-tnl-tso multi-segs
                       udp-tnl-tso ip-tnl-tso
    tx offload active: multi-segs
    rss avail:         ipv4-frag ipv4-tcp ipv4-udp ipv4-other ipv4 ipv6-tcp-ex
                       ipv6-udp-ex ipv6-frag ipv6-tcp ipv6-udp ipv6-other
                       ipv6-ex ipv6
    rss active:        ipv4-frag ipv4-tcp ipv4-udp ipv4-other ipv4 ipv6-tcp-ex
                       ipv6-udp-ex ipv6-frag ipv6-tcp ipv6-udp ipv6-other
                       ipv6-ex ipv6
    tx burst function: (nil)
    rx burst function: mlx5_rx_burst

 

thanks,

-Raj

 

On Mon, Jan 20, 2020 at 7:55 PM Florin Coras <fcoras.lists@...> wrote:

Hi Raj, 

 

Good to see progress. Check with “show int” the tx counters on the sender and rx counters on the receiver as the interfaces might be dropping traffic. One sender should be able to do more than 5Gbps. 

 

How big are the writes to the tx fifo? Make sure the tx buffer is some tens of kB. 

 

As for the issue with the number of workers, you’ll have to switch to udpc (connected udp), to ensure you have a separate connection for each ‘flow’, and to use accept in combination with epoll to accept the sessions udpc creates. 

 

Note that udpc currently does not work correctly with vcl and multiple vpp workers if vcl is the sender (not the receiver) and traffic is bidirectional. The sessions are all created on the first thread and once return traffic is received, they’re migrated to the thread selected by RSS hashing. VCL is not notified when that happens and it runs out of sync. You might not be affected by this, as you’re not receiving any return traffic, but because of that all sessions may end up stuck on the first thread. 

 

For udp transport, the listener is connection-less and bound to the main thread. As a result, all incoming packets, even if they pertain to multiple flows, are written to the listener’s buffer/fifo.  

 

Regards,

Florin



On Jan 20, 2020, at 3:50 PM, Raj Kumar <raj.gautam25@...> wrote:

 

Hi Florin,

I changed my application as you suggested. Now, I am able to achieve 5 Gbps with a single UDP stream.  Overall, I can get ~20Gbps with multiple host application . Also, the TCP throughput  is improved to ~28Gbps after tuning as mentioned in  [1]. 

On the similar topic; the UDP tx throughput is throttled to 5Gbps. Even if I run the multiple host applications the overall throughput is 5Gbps. I also tried by configuring multiple worker threads . But the problem is that all the application sessions are assigned to the same worker thread. Is there any way to assign each session  to a different worker thread?  

 

vpp# sh session verbose 2
Thread 0: no sessions
[#1][U] fd0d:edc4:ffff:2001::203:58926->fd0d:edc4:
 Rx fifo: cursize 0 nitems 3999999 has_event 0
          head 0 tail 0 segment manager 1
          vpp session 0 thread 1 app session 0 thread 0
          ooo pool 0 active elts newest 0
 Tx fifo: cursize 3999999 nitems 3999999 has_event 1
          head 1460553 tail 1460552 segment manager 1
          vpp session 0 thread 1 app session 0 thread 0
          ooo pool 0 active elts newest 4294967295
 session: state: opened opaque: 0x0 flags:
[#1][U] fd0d:edc4:ffff:2001::203:63413->fd0d:edc4:
 Rx fifo: cursize 0 nitems 3999999 has_event 0
          head 0 tail 0 segment manager 2
          vpp session 1 thread 1 app session 0 thread 0
          ooo pool 0 active elts newest 0
 Tx fifo: cursize 3999999 nitems 3999999 has_event 1
          head 3965434 tail 3965433 segment manager 2
          vpp session 1 thread 1 app session 0 thread 0
          ooo pool 0 active elts newest 4294967295
 session: state: opened opaque: 0x0 flags:
Thread 1: active sessions 2
Thread 2: no sessions
Thread 3: no sessions
Thread 4: no sessions
Thread 5: no sessions
Thread 6: no sessions
Thread 7: no sessions
vpp# sh app client
Connection                              App
[#1][U] fd0d:edc4:ffff:2001::203:58926->udp6_tx_8092[shm]
[#1][U] fd0d:edc4:ffff:2001::203:63413->udp6_tx_8093[shm]
vpp#

 

 

 

thanks,

-Raj

 

On Sun, Jan 19, 2020 at 8:50 PM Florin Coras <fcoras.lists@...> wrote:

Hi Raj,

 

The function used for receiving datagrams is limited to reading at most the length of a datagram from the rx fifo. UDP datagrams are mtu sized, so your reads are probably limited to ~1.5kB. On each epoll rx event try reading from the session handle in a while loop until you get an VPPCOM_EWOULDBLOCK. That might improve performance. 

 

Having said that, udp is lossy so unless you implement your own congestion/flow control algorithms, the data you’ll receive might be full of “holes”. What are the rx/tx error counters on your interfaces (check with “sh int”). 

 

Also, with simple tuning like this [1], you should be able to achieve much more than 15Gbps with tcp. 

 

Regards,

Florin

 



On Jan 19, 2020, at 3:25 PM, Raj Kumar <raj.gautam25@...> wrote:

 

  Hi Florin,

 By using VCL library in an UDP receiver application,  I am able to receive only 2 Mbps traffic. On increasing the traffic, I see Rx FIFO full error and application stopped receiving the traffic from the session layer.  Whereas, with TCP I can easily achieve 15Gbps throughput without tuning any DPDK parameter.  UDP tx also looks fine. From an host application I can send ~5Gbps without any issue. 

 

I am running VPP( stable/2001 code) on RHEL8 server using Mellanox 100G (MLNX5) adapters.

Please advise if I can use VCL library to receive high throughput UDP traffic ( in Gbps). I would be running multiple instances of host application to receive data ( ~50-60 Gbps).

 

I also tried by increasing the Rx FIFO size to 16MB but did not help much. The host application is just throwing the received packets , it is not doing any packet processing.

 

[root@orc01 vcl_test]# VCL_DEBUG=2 ./udp6_server_vcl
VCL<20201>: configured VCL debug level (2) from VCL_DEBUG!
VCL<20201>: allocated VCL heap = 0x7f39a17ab010, size 268435456 (0x10000000)
VCL<20201>: configured rx_fifo_size 4000000 (0x3d0900)
VCL<20201>: configured tx_fifo_size 4000000 (0x3d0900)
VCL<20201>: configured app_scope_local (1)
VCL<20201>: configured app_scope_global (1)
VCL<20201>: configured api-socket-name (/tmp/vpp-api.sock)
VCL<20201>: completed parsing vppcom config!
vppcom_connect_to_vpp:480: vcl<20201:0>: app (udp6_server) is connected to VPP!
vppcom_app_create:1104: vcl<20201:0>: sending session enable
vppcom_app_create:1112: vcl<20201:0>: sending app attach
vppcom_app_create:1121: vcl<20201:0>: app_name 'udp6_server', my_client_index 256 (0x100)
vppcom_epoll_create:2439: vcl<20201:0>: Created vep_idx 0
vppcom_session_create:1179: vcl<20201:0>: created session 1
vppcom_session_bind:1317: vcl<20201:0>: session 1 handle 1: binding to local IPv6 address fd0d:edc4:ffff:2001::203 port 8092, proto UDP
vppcom_session_listen:1349: vcl<20201:0>: session 1: sending vpp listen request...
vcl_session_bound_handler:604: vcl<20201:0>: session 1 [0x1]: listen succeeded!
vppcom_epoll_ctl:2541: vcl<20201:0>: EPOLL_CTL_ADD: vep_sh 0, sh 1, events 0x1, data 0x1!
vppcom_session_create:1179: vcl<20201:0>: created session 2
vppcom_session_bind:1317: vcl<20201:0>: session 2 handle 2: binding to local IPv6 address fd0d:edc4:ffff:2001::203 port 8093, proto UDP
vppcom_session_listen:1349: vcl<20201:0>: session 2: sending vpp listen request...
vcl_session_app_add_segment_handler:765: vcl<20201:0>: mapped new segment '20190-2' size 134217728
vcl_session_bound_handler:604: vcl<20201:0>: session 2 [0x2]: listen succeeded!
vppcom_epoll_ctl:2541: vcl<20201:0>: EPOLL_CTL_ADD: vep_sh 0, sh 2, events 0x1, data 0x2!

 

 

vpp# sh session verbose 2
[#0][U] fd0d:edc4:ffff:2001::203:8092->:::0

 Rx fifo: cursize 3999125 nitems 3999999 has_event 1
          head 2554045 tail 2553170 segment manager 1
          vpp session 0 thread 0 app session 1 thread 0
          ooo pool 0 active elts newest 4294967295
 Tx fifo: cursize 0 nitems 3999999 has_event 0
          head 0 tail 0 segment manager 1
          vpp session 0 thread 0 app session 1 thread 0
          ooo pool 0 active elts newest 0
[#0][U] fd0d:edc4:ffff:2001::203:8093->:::0

 Rx fifo: cursize 0 nitems 3999999 has_event 0
          head 0 tail 0 segment manager 2
          vpp session 1 thread 0 app session 2 thread 0
          ooo pool 0 active elts newest 0
 Tx fifo: cursize 0 nitems 3999999 has_event 0
          head 0 tail 0 segment manager 2
          vpp session 1 thread 0 app session 2 thread 0
          ooo pool 0 active elts newest 0
Thread 0: active sessions 2

 

[root@orc01 vcl_test]# cat /etc/vpp/vcl.conf
vcl {
  rx-fifo-size 4000000
  tx-fifo-size 4000000
  app-scope-local
  app-scope-global
  api-socket-name /tmp/vpp-api.sock
}
[root@orc01 vcl_test]#

 

------------------- Start of thread 0 vpp_main -------------------
Packet 1

00:09:53:445025: dpdk-input
  HundredGigabitEthernet12/0/0 rx queue 0
  buffer 0x88078: current data 0, length 1516, buffer-pool 0, ref-count 1, totlen-nifb 0, trace handle 0x0
                  ext-hdr-valid
                  l4-cksum-computed l4-cksum-correct
  PKT MBUF: port 0, nb_segs 1, pkt_len 1516
    buf_len 2176, data_len 1516, ol_flags 0x180, data_off 128, phys_addr 0x75601e80
    packet_type 0x2e1 l2_len 0 l3_len 0 outer_l2_len 0 outer_l3_len 0
    rss 0x0 fdir.hi 0x0 fdir.lo 0x0
    Packet Offload Flags
      PKT_RX_IP_CKSUM_GOOD (0x0080) IP cksum of RX pkt. is valid
      PKT_RX_L4_CKSUM_GOOD (0x0100) L4 cksum of RX pkt. is valid
    Packet Types
      RTE_PTYPE_L2_ETHER (0x0001) Ethernet packet
      RTE_PTYPE_L3_IPV6_EXT_UNKNOWN (0x00e0) IPv6 packet with or without extension headers
      RTE_PTYPE_L4_UDP (0x0200) UDP packet
  IP6: b8:83:03:79:9f:e4 -> b8:83:03:79:af:8c 802.1q vlan 2001
  UDP: fd0d:edc4:ffff:2001::201 -> fd0d:edc4:ffff:2001::203
    tos 0x00, flow label 0x0, hop limit 64, payload length 1458
  UDP: 56944 -> 8092
    length 1458, checksum 0xb22d
00:09:53:445028: ethernet-input
  frame: flags 0x3, hw-if-index 2, sw-if-index 2
  IP6: b8:83:03:79:9f:e4 -> b8:83:03:79:af:8c 802.1q vlan 2001
00:09:53:445029: ip6-input
  UDP: fd0d:edc4:ffff:2001::201 -> fd0d:edc4:ffff:2001::203
    tos 0x00, flow label 0x0, hop limit 64, payload length 1458
  UDP: 56944 -> 8092
    length 1458, checksum 0xb22d
00:09:53:445031: ip6-lookup
  fib 0 dpo-idx 6 flow hash: 0x00000000
  UDP: fd0d:edc4:ffff:2001::201 -> fd0d:edc4:ffff:2001::203
    tos 0x00, flow label 0x0, hop limit 64, payload length 1458
  UDP: 56944 -> 8092
    length 1458, checksum 0xb22d
00:09:53:445032: ip6-local
    UDP: fd0d:edc4:ffff:2001::201 -> fd0d:edc4:ffff:2001::203
      tos 0x00, flow label 0x0, hop limit 64, payload length 1458
    UDP: 56944 -> 8092
      length 1458, checksum 0xb22d
00:09:53:445032: ip6-udp-lookup
  UDP: src-port 56944 dst-port 8092
00:09:53:445033: udp6-input
  UDP_INPUT: connection 0, disposition 5, thread 0

 

 

thanks,

-Raj

 

 

On Wed, Jan 15, 2020 at 4:09 PM Raj Kumar via Lists.Fd.Io <raj.gautam25=gmail.com@...> wrote:

Hi Florin,

Yes,  [2] patch resolved the  IPv6/UDP receiver issue. 

Thanks! for your help.

 

thanks,

-Raj

 

On Tue, Jan 14, 2020 at 9:35 PM Florin Coras <fcoras.lists@...> wrote:

Hi Raj, 

 

First of all, with this [1], the vcl test app/client can establish a udpc connection. Note that udp will most probably lose packets, so large exchanges with those apps may not work. 

 

As for the second issue, does [2] solve it?

 

Regards, 

Florin

 

[2] https://gerrit.fd.io/r/c/vpp/+/24334



On Jan 14, 2020, at 12:59 PM, Raj Kumar <raj.gautam25@...> wrote:

 

Hi Florin,

Thanks! for the reply. 

 

I realized the issue with the non-connected case.  For receiving datagrams, I was using recvfrom() with DONOT_WAIT flag because of that  vppcom_session_recvfrom() api was failing. It expects either 0 or MSG_PEEK flag.

  if (flags == 0)
    rv = vppcom_session_read (session_handle, buffer, buflen);
  else if (flags & MSG_PEEK) 0x2
    rv = vppcom_session_peek (session_handle, buffer, buflen);
  else
    {
      VDBG (0, "Unsupport flags for recvfrom %d", flags);
      return VPPCOM_EAFNOSUPPORT;
    }

 

 I changed the flag to 0 in recvfrom() , after that UDP rx is working fine but only for IPv4.

 

I am facing a different issue with IPv6/UDP receiver.  I am getting "no listener for dst port" error.

 

Please let me know if I am doing something wrong. 

Here are the traces : -

 

[root@orc01 testcode]# VCL_DEBUG=2 LDP_DEBUG=2 LD_PRELOAD=/opt/vpp/build-root/install-vpp-native/vpp/lib/libvcl_ldpreload.so  VCL_CONFIG=/etc/vpp/vcl.cfg ./udp6_rx
VCL<1164>: configured VCL debug level (2) from VCL_DEBUG!
VCL<1164>: allocated VCL heap = 0x7ff877439010, size 268435456 (0x10000000)
VCL<1164>: configured rx_fifo_size 4000000 (0x3d0900)
VCL<1164>: configured tx_fifo_size 4000000 (0x3d0900)
VCL<1164>: configured app_scope_local (1)
VCL<1164>: configured app_scope_global (1)
VCL<1164>: configured api-socket-name (/tmp/vpp-api.sock)
VCL<1164>: completed parsing vppcom config!
vppcom_connect_to_vpp:549: vcl<1164:0>: app (ldp-1164-app) is connected to VPP!
vppcom_app_create:1067: vcl<1164:0>: sending session enable
vppcom_app_create:1075: vcl<1164:0>: sending app attach
vppcom_app_create:1084: vcl<1164:0>: app_name 'ldp-1164-app', my_client_index 0 (0x0)
ldp_init:209: ldp<1164>: configured LDP debug level (2) from env var LDP_DEBUG!
ldp_init:282: ldp<1164>: LDP initialization: done!
ldp_constructor:2490: LDP<1164>: LDP constructor: done!
socket:974: ldp<1164>: calling vls_create: proto 1 (UDP), is_nonblocking 0
vppcom_session_create:1142: vcl<1164:0>: created session 0
bind:1086: ldp<1164>: fd 32: calling vls_bind: vlsh 0, addr 0x7fff9a93efe0, len 28
vppcom_session_bind:1280: vcl<1164:0>: session 0 handle 0: binding to local IPv6 address :: port 8092, proto UDP
vppcom_session_listen:1312: vcl<1164:0>: session 0: sending vpp listen request...
vcl_session_bound_handler:610: vcl<1164:0>: session 0 [0x1]: listen succeeded!
bind:1102: ldp<1164>: fd 32: returning 0

 

vpp# sh app server
Connection                              App                          Wrk
[0:0][CT:U] :::8092->:::0               ldp-1164-app[shm]             0
[#0][U] :::8092->:::0                   ldp-1164-app[shm]             0

 

vpp# sh err
   Count                    Node                  Reason
         7               dpdk-input               no error
      2606             ip6-udp-lookup             no listener for dst port
         8                arp-reply               ARP replies sent
         1              arp-disabled              ARP Disabled on this interface
        13                ip6-glean               neighbor solicitations sent
      2606                ip6-input               valid ip6 packets
         4          ip6-local-hop-by-hop          Unknown protocol ip6 local h-b-h packets dropped
      2606             ip6-icmp-error             destination unreachable response sent
        40             ip6-icmp-input             valid packets
         1             ip6-icmp-input             neighbor solicitations from source not on link
        12             ip6-icmp-input             neighbor solicitations for unknown targets
         1             ip6-icmp-input             neighbor advertisements sent
         1             ip6-icmp-input             neighbor advertisements received
        40             ip6-icmp-input             router advertisements sent
        40             ip6-icmp-input             router advertisements received
         1             ip4-icmp-input             echo replies sent
        89               lldp-input               lldp packets received on disabled interfaces
      1328                llc-input               unknown llc ssap/dsap
vpp#

 

vpp# show trace
------------------- Start of thread 0 vpp_main -------------------
Packet 1

00:23:39:401354: dpdk-input
  HundredGigabitEthernet12/0/0 rx queue 0
  buffer 0x8894e: current data 0, length 1516, buffer-pool 0, ref-count 1, totlen-nifb 0, trace handle 0x0
                  ext-hdr-valid
                  l4-cksum-computed l4-cksum-correct
  PKT MBUF: port 0, nb_segs 1, pkt_len 1516
    buf_len 2176, data_len 1516, ol_flags 0x180, data_off 128, phys_addr 0x75025400
    packet_type 0x2e1 l2_len 0 l3_len 0 outer_l2_len 0 outer_l3_len 0
    rss 0x0 fdir.hi 0x0 fdir.lo 0x0
    Packet Offload Flags
      PKT_RX_IP_CKSUM_GOOD (0x0080) IP cksum of RX pkt. is valid
      PKT_RX_L4_CKSUM_GOOD (0x0100) L4 cksum of RX pkt. is valid
    Packet Types
      RTE_PTYPE_L2_ETHER (0x0001) Ethernet packet
      RTE_PTYPE_L3_IPV6_EXT_UNKNOWN (0x00e0) IPv6 packet with or without extension headers
      RTE_PTYPE_L4_UDP (0x0200) UDP packet
  IP6: b8:83:03:79:9f:e4 -> b8:83:03:79:af:8c 802.1q vlan 2001
  UDP: fd0d:edc4:ffff:2001::201 -> fd0d:edc4:ffff:2001::203
    tos 0x00, flow label 0x0, hop limit 64, payload length 1458
  UDP: 60593 -> 8092
    length 1458, checksum 0x0964
00:23:39:401355: ethernet-input
  frame: flags 0x3, hw-if-index 2, sw-if-index 2
  IP6: b8:83:03:79:9f:e4 -> b8:83:03:79:af:8c 802.1q vlan 2001
00:23:39:401356: ip6-input
  UDP: fd0d:edc4:ffff:2001::201 -> fd0d:edc4:ffff:2001::203
    tos 0x00, flow label 0x0, hop limit 64, payload length 1458
  UDP: 60593 -> 8092
    length 1458, checksum 0x0964
00:23:39:401357: ip6-lookup
  fib 0 dpo-idx 5 flow hash: 0x00000000
  UDP: fd0d:edc4:ffff:2001::201 -> fd0d:edc4:ffff:2001::203
    tos 0x00, flow label 0x0, hop limit 64, payload length 1458
  UDP: 60593 -> 8092
    length 1458, checksum 0x0964
00:23:39:401361: ip6-local
    UDP: fd0d:edc4:ffff:2001::201 -> fd0d:edc4:ffff:2001::203
      tos 0x00, flow label 0x0, hop limit 64, payload length 1458
    UDP: 60593 -> 8092
      length 1458, checksum 0x0964
00:23:39:401362: ip6-udp-lookup
  UDP: src-port 60593 dst-port 8092 (no listener)
00:23:39:401362: ip6-icmp-error
  UDP: fd0d:edc4:ffff:2001::201 -> fd0d:edc4:ffff:2001::203
    tos 0x00, flow label 0x0, hop limit 64, payload length 1458
  UDP: 60593 -> 8092
    length 1458, checksum 0x0964
00:23:39:401363: error-drop
  rx:HundredGigabitEthernet12/0/0.2001
00:23:39:401364: drop
  ip6-input: valid ip6 packets

vpp#

 

 

Thanks,

-Raj

 

 

On Tue, Jan 14, 2020 at 1:44 PM Florin Coras <fcoras.lists@...> wrote:

Hi Raj,

Session layer does support connection-less transports but udp does not raise accept notifications to vcl. UDPC might, but we haven’t tested udpc with vcl in a long time so it might not work properly. 

What was the problem you were hitting in the non-connected case?

Regards,
Florin

> On Jan 14, 2020, at 7:13 AM, raj.gautam25@... wrote:

> Hi ,
> I am trying some host application tests ( using LD_PRELOAD) .  TCP rx and tx both work fine. UDP tx also works fine. 
> The issue is only with UDP rx .  In some discussion it was mentioned that session layer does not support connection-less transports so protocols like udp still need to accept connections and only afterwards read from the fifos.
> So, I changed the UDP receiver application to use listen() and accept() before read() . But , I am still having issue to make it run. 
> After I started, udp traffic from other server it seems to accept the connection but never returns from the vppcom_session_accept() function.
> VPP release is 19.08.

> vpp# sh app server
> Connection                              App                          Wrk
> [0:0][CT:U] 0.0.0.0:8090->0.0.0.0:0     ldp-36646-app[shm]            0
> [#0][U] 0.0.0.0:8090->0.0.0.0:0         ldp-36646-app[shm]            0
> vpp#
>  
>  
> [root@orc01 testcode]#  VCL_DEBUG=2 LDP_DEBUG=2 LD_PRELOAD=/opt/vpp/build-root/install-vpp-native/vpp/lib/libvcl_ldpreload.so  VCL_CONFIG=/etc/vpp/vcl.cfg ./udp_rx
> VCL<36646>: configured VCL debug level (2) from VCL_DEBUG!
> VCL<36646>: allocated VCL heap = 0x7f77e5309010, size 268435456 (0x10000000)
> VCL<36646>: configured rx_fifo_size 4000000 (0x3d0900)
> VCL<36646>: configured tx_fifo_size 4000000 (0x3d0900)
> VCL<36646>: configured app_scope_local (1)
> VCL<36646>: configured app_scope_global (1)
> VCL<36646>: configured api-socket-name (/tmp/vpp-api.sock)
> VCL<36646>: completed parsing vppcom config!
> vppcom_connect_to_vpp:549: vcl<36646:0>: app (ldp-36646-app) is connected to VPP!
> vppcom_app_create:1067: vcl<36646:0>: sending session enable
> vppcom_app_create:1075: vcl<36646:0>: sending app attach
> vppcom_app_create:1084: vcl<36646:0>: app_name 'ldp-36646-app', my_client_index 0 (0x0)
> ldp_init:209: ldp<36646>: configured LDP debug level (2) from env var LDP_DEBUG!
> ldp_init:282: ldp<36646>: LDP initialization: done!
> ldp_constructor:2490: LDP<36646>: LDP constructor: done!
> socket:974: ldp<36646>: calling vls_create: proto 1 (UDP), is_nonblocking 0
> vppcom_session_create:1142: vcl<36646:0>: created session 0
> Socket successfully created..
> bind:1086: ldp<36646>: fd 32: calling vls_bind: vlsh 0, addr 0x7fff3f3c1040, len 16
> vppcom_session_bind:1280: vcl<36646:0>: session 0 handle 0: binding to local IPv4 address 0.0.0.0 port 8090, proto UDP
> vppcom_session_listen:1312: vcl<36646:0>: session 0: sending vpp listen request...
> vcl_session_bound_handler:610: vcl<36646:0>: session 0 [0x1]: listen succeeded!
> bind:1102: ldp<36646>: fd 32: returning 0
> Socket successfully binded..
> listen:2005: ldp<36646>: fd 32: calling vls_listen: vlsh 0, n 5
> vppcom_session_listen:1308: vcl<36646:0>: session 0 [0x1]: already in listen state!
> listen:2020: ldp<36646>: fd 32: returning 0
> Server listening..
> ldp_accept4:2043: ldp<36646>: listen fd 32: calling vppcom_session_accept: listen sid 0, ep 0x0, flags 0x3f3c0fc0
> vppcom_session_accept:1478: vcl<36646:0>: discarded event: 0
>  

 

 

 

 

 






#dpdk #vpp #mellanox #cx5 #dpdk #vpp #mellanox #cx5

George Tkachuk
 

Hi, 

This question is about mellanox support. Mellanox PMD in DPDK supports optimization flags for EAL command line like the following for example: -w 8a:00.1,mprq_en=1,rxqs_min_mprq=1,mprq_log_stride_num=9 . Is there a way to specify these parameters in VPP startup file? 

Best Regards,
Georgii


FDIO Maintenance - 2020-02-20 1900 UTC to 2400 UTC

Vanessa Valderrama
 

Maintaince has been moved to February 20th


What: Standard updates and upgrade
  • Jenkins
    • OS and security updates
    • Upgrade to 2.204.1
    • Plugin updates
  • Nexus
    • OS updates
  • Jira
    • OS updates
  • Gerrit
    • OS updates
  • Sonar
    • OS updates
  • OpenGrok
    • OS updates
When:  2020-02-25 1900 UTC to 2400 UTC

Impact:

Maintenance will require a reboot of each FD.io system. Jenkins will be placed in shutdown mode at 1800 UTC. Please let us know if specific jobs cannot be aborted.
The following systems will be unavailable during the maintenance window:
  •     Jenkins sandbox
  •     Jenkins production
  •     Nexus
  •     Jira
  •     Gerrit
  •     Sonar
  •     OpenGrok


On 1/7/20 8:30 AM, Vanessa Valderrama wrote:
Please let us know as soon as possible if this maintenance conflicts with your project.

What:
  • Jenkins
    • OS and security updates
    • Upgrade to 2.204.1
    • Plugin updates
  • Nexus
    • OS updates
  • Jira
    • OS updates
  • Gerrit
    • OS updates
  • Sonar
    • OS updates
  • OpenGrok
    • OS updates
When:  2020-02-05 1900 UTC to 2400 UTC

Impact:

Maintenance will require a reboot of each FD.io system. Jenkins will be placed in shutdown mode at 1800 UTC. Please let us know if specific jobs cannot be aborted.
The following systems will be unavailable during the maintenance window:
  •     Jenkins sandbox
  •     Jenkins production
  •     Nexus
  •     Jira
  •     Gerrit
  •     Sonar
  •     OpenGrok


Re: Regarding buffers-per-numa parameter

Damjan Marion
 


Shouldn’t be too hard to checkout commit prior to that one and test if problem is still there…

— 
Damjan


On 12 Feb 2020, at 14:50, chetan bhasin <chetan.bhasin017@...> wrote:

Hi,

Looking into the changes in vpp 20.1 , the below change looks good important related to buffer indices .

vlib: don't use vector for keeping buffer indices in the pool

Type: refactor

 

Change-Id: I72221b97d7e0bf5c93e20bbda4473ca67bfcdeb4

Signed-off-by: Damjan Marion <damarion@...> 

 

https://github.com/FDio/vpp/commit/b6e8b1a7c8bf9f9fbd05cdc3c90111d9e7a6897b#diff-2260a8080303fbcc30ef32f782b4d6df


Can anybody suggest  ?

Shouldn’t be too hard to checkout commit prior to that one and test if problem is still there…

— 
Damjan


Re: cli: show classify filter crash

jiangxiaoming@...
 

The cli show classify filter indeed crash,  and cli: show classify filter verbose  work well.
Code git head is 4a06846dd668d7f687e6770215c38e8feb5f1740