Re: VPP crashes on CSIT Taishan server

Damjan Marion
 


Makes sense to me, please submit patch to gerrit…

— 
Damjan

On 14 Feb 2020, at 04:18, Lijian Zhang <Lijian.Zhang@...> wrote:

Hi,
VPP crashes on CSIT Taishan server due to function vlib_get_thread_core_numa (unsigned cpu_id) not getting NUMA node correctly via cpu_id on Taishan server.
vlib_get_thread_core_numa () is using physical_package_id as NUMA node.
 
However, Taishan server has 2 physical sockets, but 4 NUMA nodes as below output of lscpu. And the sysfs shows the physical_package_id is not sequential on Taishan server.
taishan-d05-08:~$ cat /sys/devices/system/cpu/cpu37/topology/physical_package_id
3002
taishan-d05-08:~$ cat /sys/devices/system/cpu/cpu3/topology/physical_package_id
36
taishan-d05-08:~$ cat /sys/devices/system/cpu/cpu15/topology/physical_package_id
36
 
How about using below sysfs to get NUA node via cpu_id? The code change is also attached in the end.
 
$ cat /sys/devices/system/node/online
0-3
$ cat /sys/devices/system/node/node0/cpulist
0-15
$ cat /sys/devices/system/node/node1/cpulist
16-31
$ cat /sys/devices/system/node/node2/cpulist
32-47
$ cat /sys/devices/system/node/node3/cpulist
48-63
 
taishan-d05-08:~$ lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              64
On-line CPU(s) list: 0-63
Thread(s) per core:  1
Core(s) per socket:  32
Socket(s):           2
NUMA node(s):        4
Vendor ID:           ARM
Model:               2
Model name:          Cortex-A72
Stepping:            r0p2
BogoMIPS:            100.00
L1d cache:           32K
L1i cache:           48K
L2 cache:            1024K
L3 cache:            16384K
NUMA node0 CPU(s):   0-15
NUMA node1 CPU(s):   16-31
NUMA node2 CPU(s):   32-47
NUMA node3 CPU(s):   48-63
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
 
 
diff --git a/src/vlib/threads.c b/src/vlib/threads.c
index 1ce4dc156..3f0905421 100644
--- a/src/vlib/threads.c
+++ b/src/vlib/threads.c
@@ -598,15 +598,30 @@ void
vlib_get_thread_core_numa (vlib_worker_thread_t * w, unsigned cpu_id)
{
   const char *sys_cpu_path = "/sys/devices/system/cpu/cpu";
+  const char *sys_node_path = "/sys/devices/system/node/node";
+  clib_bitmap_t *nbmp = 0, *cbmp = 0;
+  u32 node;
   u8 *p = 0;
   int core_id = -1, numa_id = -1;
 
   p = format (p, "%s%u/topology/core_id%c", sys_cpu_path, cpu_id, 0);
   clib_sysfs_read ((char *) p, "%d", &core_id);
   vec_reset_length (p);
-  p = format (p, "%s%u/topology/physical_package_id%c", sys_cpu_path,
-             cpu_id, 0);
-  clib_sysfs_read ((char *) p, "%d", &numa_id);
+
+  /* *INDENT-OFF* */
+  clib_sysfs_read ("/sys/devices/system/node/online", "%U",
+        unformat_bitmap_list, &nbmp);
+  clib_bitmap_foreach (node, nbmp, ({
+    p = format (p, "%s%u/cpulist%c", sys_node_path, node, 0);
+    clib_sysfs_read ((char *) p, "%U", unformat_bitmap_list, &cbmp);
+    if (clib_bitmap_get (cbmp, cpu_id))
+      numa_id = node;
+    vec_reset_length (cbmp);
+    vec_reset_length (p);
+  }));
+  /* *INDENT-ON* */
+  vec_free (nbmp);
+  vec_free (cbmp);
   vec_free (p);
 
   w->core_id = core_id;
 
Thanks.

Join vpp-dev@lists.fd.io to automatically receive all group messages.