Locked Race condition between bihash deletion and searching - misuse or bug?


Hao Tian
 

Hi all,

I found that bihash might return zero'ed value (0xff's) if deletion and searching were performed in parallel on different threads. This is reproducible by only ~100 lines of code, from 21.06 all the way up to git master, and on multiple machines from Coffee Lake to Tiger Lake.

The code (a plugin) and test command is shown below. Given how easy it is to reproduce the problem, I'm not sure whether this is a bug or something missing on my end. Any advice is welcomed. Thanks!

======== bihash_race.c ========

#include <vnet/plugin/plugin.h>
#include <vpp/app/version.h>
#include <vppinfra/bihash_8_8.h>

static volatile int br_run;
static clib_bihash_8_8_t br_bihash_8_8;
static int test_run_cnt[2];

static void
bihash_race_set (u64 key)
{
  clib_bihash_kv_8_8_t kv = { .key = key, .value = 1 };

  clib_bihash_add_del_8_8 (&br_bihash_8_8, &kv, BIHASH_ADD);
}

static void
bihash_race_del (u64 key)
{
  clib_bihash_kv_8_8_t kv = { .key = key, .value = 1 };

  clib_bihash_add_del_8_8 (&br_bihash_8_8, &kv, BIHASH_DEL);
}

static void
bihash_race_get (u64 key)
{
  clib_bihash_kv_8_8_t kv = { .key = key, .value = 1 };
  clib_bihash_kv_8_8_t value;

  if (!clib_bihash_search_8_8 (&br_bihash_8_8, &kv, &value))
    {
      if (value.value != 1)
        {
          clib_warning ("suspicious value: 0x%lx\n", value.value);
        }
    }
}

static clib_error_t *
bihash_race_init (vlib_main_t *vm)
{
  clib_bihash_init_8_8 (&br_bihash_8_8, "bihash_race hash", 4096, 4096 * 256);

  return 0;
}

VLIB_NODE_FN (br_test_node)
(vlib_main_t *vm, vlib_node_runtime_t *node, vlib_frame_t *frame)
{
  u64 k = 0x01010101UL;

  if (!br_run || vm->thread_index == 0)
    {
      return 0;
    }

  // perform bihash set and delete for 50000 times each in thread 1
  if (vm->thread_index == 1 && test_run_cnt[0] < 100000)
    {
      if (test_run_cnt[1] & 1)
        bihash_race_del (k);
      else
        bihash_race_set (k);
      test_run_cnt[0]++;
      return 0;
    }

  // perform bihash lookup for 100000 times in thread 2
  if (vm->thread_index == 2 && test_run_cnt[1] < 100000)
    {
      bihash_race_get (k);
      test_run_cnt[1]++;
      return 0;
    }
  return 0;
}

void
bihash_race_test_reset ()
{
  memset (test_run_cnt, 0, sizeof (test_run_cnt));
  clib_warning ("========== test reset ==========");
}

static clib_error_t *
bihash_race_test_command_fn (vlib_main_t *vm, unformat_input_t *input,
                             vlib_cli_command_t *cmd)
{
  u8 state = ~0;

  while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT)
    {
      if (unformat (input, "on"))
        state = 1;
      else if (unformat (input, "off"))
        state = 0;
      else
        return clib_error_return (0, "invalid input");
    }

  if (state)
    {
      br_run = 1;
    }
  else
    {
      br_run = 0;
      bihash_race_test_reset ();
      vlib_cli_output (vm, "========== test reset ==========");
    }

  return 0;
}

VLIB_CLI_COMMAND (bihash_race_test_command, static) = {
  .path = "set bihash-race-test",
  .short_help = "set bihash-race-test <on>|<off>",
  .function = bihash_race_test_command_fn,
};

VLIB_REGISTER_NODE (br_test_node) = {
  .name = "bihash-race-test",
  .type = VLIB_NODE_TYPE_INPUT,
  .state = VLIB_NODE_STATE_POLLING,
};

VLIB_INIT_FUNCTION (bihash_race_init);

VLIB_PLUGIN_REGISTER () = {
  .version = VPP_BUILD_VER,
  .description = "bihash race test",
};

========= CMakeLists.txt =========

add_vpp_plugin(bihash_race
  SOURCES
  bihash_race.c
)

========= startup.conf =========

unix {
  nodaemon
  full-coredump
  cli-listen /run/vpp/cli.sock
  gid 1000
  interactive
}

cpu {
  workers 3
  main-core 1
}

dpdk {
  no-pci
}

========= test command =========

terminal1 # vpp -c startup.conf
terminal2 # while true; do vppctl set bihash-race-test on; sleep 1; vppctl set bihash-race-test off; done

VPP will spit out a lot of "suspicious value: 0xffffffffffffffff" in terminal1, while the code above never saves such value into bihash - the value comes from the !is_add branch in clib_bihash_add_del_inline_with_hash. Changing the memset value (and clib_bihash_is_free_* obviously) in this branch will lead to this new value being returned.

Regards,
Hao Tian


Dave Barach
 

Quick experiment: in
src/vppinfra/bihash_template.h:clib_bihash_search_inline_2_with_hash(),
replace this:

if (PREDICT_FALSE (b->lock))
{
volatile BVT (clib_bihash_bucket) * bv = b;
while (bv->lock)
CLIB_PAUSE ();
}

With:

BV(clib_bihash_lock_bucket(b));

and make sure to BV(clib_bihash_unlock_bucket(b)); just prior to return 0
and return -1 in that function.

Please let me know what happens.

Thanks... Dave

-----Original Message-----
From: vpp-dev@... <vpp-dev@...> On Behalf Of Hao Tian
Sent: Tuesday, March 14, 2023 4:13 AM
To: vpp-dev@...
Subject: [vpp-dev] Race condition between bihash deletion and searching -
misuse or bug?

Hi all,

I found that bihash might return zero'ed value (0xff's) if deletion and
searching were performed in parallel on different threads. This is
reproducible by only ~100 lines of code, from 21.06 all the way up to git
master, and on multiple machines from Coffee Lake to Tiger Lake.

The code (a plugin) and test command is shown below. Given how easy it is to
reproduce the problem, I'm not sure whether this is a bug or something
missing on my end. Any advice is welcomed. Thanks!

======== bihash_race.c ========

#include <vnet/plugin/plugin.h>
#include <vpp/app/version.h>
#include <vppinfra/bihash_8_8.h>

static volatile int br_run;
static clib_bihash_8_8_t br_bihash_8_8;
static int test_run_cnt[2];

static void
bihash_race_set (u64 key)
{
  clib_bihash_kv_8_8_t kv = { .key = key, .value = 1 };

  clib_bihash_add_del_8_8 (&br_bihash_8_8, &kv, BIHASH_ADD); }

static void
bihash_race_del (u64 key)
{
  clib_bihash_kv_8_8_t kv = { .key = key, .value = 1 };

  clib_bihash_add_del_8_8 (&br_bihash_8_8, &kv, BIHASH_DEL); }

static void
bihash_race_get (u64 key)
{
  clib_bihash_kv_8_8_t kv = { .key = key, .value = 1 };
  clib_bihash_kv_8_8_t value;

  if (!clib_bihash_search_8_8 (&br_bihash_8_8, &kv, &value))
    {
      if (value.value != 1)
        {
          clib_warning ("suspicious value: 0x%lx\n", value.value);
        }
    }
}

static clib_error_t *
bihash_race_init (vlib_main_t *vm)
{
  clib_bihash_init_8_8 (&br_bihash_8_8, "bihash_race hash", 4096, 4096 *
256);

  return 0;
}

VLIB_NODE_FN (br_test_node)
(vlib_main_t *vm, vlib_node_runtime_t *node, vlib_frame_t *frame) {
  u64 k = 0x01010101UL;

  if (!br_run || vm->thread_index == 0)
    {
      return 0;
    }

  // perform bihash set and delete for 50000 times each in thread 1
  if (vm->thread_index == 1 && test_run_cnt[0] < 100000)
    {
      if (test_run_cnt[1] & 1)
        bihash_race_del (k);
      else
        bihash_race_set (k);
      test_run_cnt[0]++;
      return 0;
    }

  // perform bihash lookup for 100000 times in thread 2
  if (vm->thread_index == 2 && test_run_cnt[1] < 100000)
    {
      bihash_race_get (k);
      test_run_cnt[1]++;
      return 0;
    }
  return 0;
}

void
bihash_race_test_reset ()
{
  memset (test_run_cnt, 0, sizeof (test_run_cnt));
  clib_warning ("========== test reset =========="); }

static clib_error_t *
bihash_race_test_command_fn (vlib_main_t *vm, unformat_input_t *input,
                             vlib_cli_command_t *cmd) {
  u8 state = ~0;

  while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT)
    {
      if (unformat (input, "on"))
        state = 1;
      else if (unformat (input, "off"))
        state = 0;
      else
        return clib_error_return (0, "invalid input");
    }

  if (state)
    {
      br_run = 1;
    }
  else
    {
      br_run = 0;
      bihash_race_test_reset ();
      vlib_cli_output (vm, "========== test reset ==========");
    }

  return 0;
}

VLIB_CLI_COMMAND (bihash_race_test_command, static) = {
  .path = "set bihash-race-test",
  .short_help = "set bihash-race-test <on>|<off>",
  .function = bihash_race_test_command_fn, };

VLIB_REGISTER_NODE (br_test_node) = {
  .name = "bihash-race-test",
  .type = VLIB_NODE_TYPE_INPUT,
  .state = VLIB_NODE_STATE_POLLING,
};

VLIB_INIT_FUNCTION (bihash_race_init);

VLIB_PLUGIN_REGISTER () = {
  .version = VPP_BUILD_VER,
  .description = "bihash race test",
};

========= CMakeLists.txt =========

add_vpp_plugin(bihash_race
  SOURCES
  bihash_race.c
)

========= startup.conf =========

unix {
  nodaemon
  full-coredump
  cli-listen /run/vpp/cli.sock
  gid 1000
  interactive
}

cpu {
  workers 3
  main-core 1
}

dpdk {
  no-pci
}

========= test command =========

terminal1 # vpp -c startup.conf
terminal2 # while true; do vppctl set bihash-race-test on; sleep 1; vppctl
set bihash-race-test off; done

VPP will spit out a lot of "suspicious value: 0xffffffffffffffff" in
terminal1, while the code above never saves such value into bihash - the
value comes from the !is_add branch in clib_bihash_add_del_inline_with_hash.
Changing the memset value (and clib_bihash_is_free_* obviously) in this
branch will lead to this new value being returned.

Regards,
Hao Tian


Hao Tian
 

Hi,

I've done the change you asked, and the "suspicious value" warnings are all gone. Here is the diff:

diff --git a/src/vppinfra/bihash_template.h b/src/vppinfra/bihash_template.h
index c4e120e4a..b9f658db3 100644
--- a/src/vppinfra/bihash_template.h
+++ b/src/vppinfra/bihash_template.h
@@ -532,12 +532,7 @@ static inline int BV (clib_bihash_search_inline_2_with_hash)
if (PREDICT_FALSE (BV (clib_bihash_bucket_is_empty) (b)))
return -1;

- if (PREDICT_FALSE (b->lock))
- {
- volatile BVT (clib_bihash_bucket) * bv = b;
- while (bv->lock)
- CLIB_PAUSE ();
- }
+ BV (clib_bihash_lock_bucket) (b);

v = BV (clib_bihash_get_value) (h, b->offset);

@@ -557,9 +552,11 @@ static inline int BV (clib_bihash_search_inline_2_with_hash)
if (BV (clib_bihash_key_compare) (v->kvp[i].key, search_key->key))
{
*valuep = v->kvp[i];
+ BV (clib_bihash_unlock_bucket) (b);
return 0;
}
}
+ BV (clib_bihash_unlock_bucket) (b);
return -1;
}


Some questions regarding the change:

1. There is similar logic in clib_bihash_search_inline_with_hash. I suppose that this change needs to be applied there as well. Am I right?

2. This patch does eliminate the race condition, but appears to introduce a huge performance regression. The clocks in `show runtime` will double after the change (with the 100000-ops limit removed, so the test node runs indefinitely).

Regards,
Hao Tian


Dave Barach
 

Doubling the number of atomic ops was expected to cause a severe performance
regression. This hack demonstrates that the test code wasn't responsible. A
bit of extra suspicion in clib_bihash_search_inline_2_with_hash() will
likely solve the problem without the huge performance regression.

D.

-----Original Message-----
From: vpp-dev@... <vpp-dev@...> On Behalf Of Hao Tian
Sent: Tuesday, March 14, 2023 9:47 PM
To: vpp-dev@...
Subject: Re: [vpp-dev] Race condition between bihash deletion and searching
- misuse or bug?

Hi,

I've done the change you asked, and the "suspicious value" warnings are all
gone. Here is the diff:

diff --git a/src/vppinfra/bihash_template.h b/src/vppinfra/bihash_template.h
index c4e120e4a..b9f658db3 100644
--- a/src/vppinfra/bihash_template.h
+++ b/src/vppinfra/bihash_template.h
@@ -532,12 +532,7 @@ static inline int BV
(clib_bihash_search_inline_2_with_hash)
if (PREDICT_FALSE (BV (clib_bihash_bucket_is_empty) (b)))
return -1;

- if (PREDICT_FALSE (b->lock))
- {
- volatile BVT (clib_bihash_bucket) * bv = b;
- while (bv->lock)
- CLIB_PAUSE ();
- }
+ BV (clib_bihash_lock_bucket) (b);

v = BV (clib_bihash_get_value) (h, b->offset);

@@ -557,9 +552,11 @@ static inline int BV
(clib_bihash_search_inline_2_with_hash)
if (BV (clib_bihash_key_compare) (v->kvp[i].key, search_key->key))
{
*valuep = v->kvp[i];
+ BV (clib_bihash_unlock_bucket) (b);
return 0;
}
}
+ BV (clib_bihash_unlock_bucket) (b);
return -1;
}


Some questions regarding the change:

1. There is similar logic in clib_bihash_search_inline_with_hash. I suppose
that this change needs to be applied there as well. Am I right?

2. This patch does eliminate the race condition, but appears to introduce a
huge performance regression. The clocks in `show runtime` will double after
the change (with the 100000-ops limit removed, so the test node runs
indefinitely).

Regards,
Hao Tian


Hao Tian
 

Hi,

I tried but could not come up with any way that is able to ensure the kvp being valid upon return without using the full bucket lock.

Maybe we can make a copy of the value before returning, validate the copy and return that copy instead. Critical section can be shrinked to cover only the copying process, which seems to perform better, but I'm not sure if this is the best approach.

Could you please shed some light here? Thanks!

Regards,
Hao Tian


Andrew Yourtchenko
 

Hao,

I noticed the same behavior when stress-testing the multi thread session handling for the ACL plugin a while ago. I thought this trade off is there to avoid having to do the hard locks in bihash code, rather than it being a bug.

As you say - the special value comes only if the deletion is in progress, and it is always the same. So I just treated that case in my code same as “not found”.

My logic was: if an entry is just in process of being deleted, there is very little use for its old value anyway.

--a

On 15 Mar 2023, at 14:45, Hao Tian <tianhao.ms@...> wrote:

Hi,

I tried but could not come up with any way that is able to ensure the kvp being valid upon return without using the full bucket lock.

Maybe we can make a copy of the value before returning, validate the copy and return that copy instead. Critical section can be shrinked to cover only the copying process, which seems to perform better, but I'm not sure if this is the best approach.

Could you please shed some light here? Thanks!

Regards,
Hao Tian


Dave Barach
 

I'm doing a bit of work to straighten out the template, hopefully without causing a measurable performance regression.

Hao's test code is a bit of a corner-case: there is exactly one record in the database which the code thrashes as hard as possible.

D.

-----Original Message-----
From: vpp-dev@... <vpp-dev@...> On Behalf Of Andrew Yourtchenko
Sent: Wednesday, March 15, 2023 12:33 PM
To: vpp-dev@...
Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

Hao,

I noticed the same behavior when stress-testing the multi thread session handling for the ACL plugin a while ago. I thought this trade off is there to avoid having to do the hard locks in bihash code, rather than it being a bug.

As you say - the special value comes only if the deletion is in progress, and it is always the same. So I just treated that case in my code same as “not found”.

My logic was: if an entry is just in process of being deleted, there is very little use for its old value anyway.

--a

On 15 Mar 2023, at 14:45, Hao Tian <tianhao.ms@...> wrote:

Hi,

I tried but could not come up with any way that is able to ensure the kvp being valid upon return without using the full bucket lock.

Maybe we can make a copy of the value before returning, validate the copy and return that copy instead. Critical section can be shrinked to cover only the copying process, which seems to perform better, but I'm not sure if this is the best approach.

Could you please shed some light here? Thanks!

Regards,
Hao Tian


Hao Tian
 

Hi Andrew,

Thanks for the insight, the idea about treating it as optimization is quite interesting.

This is fine for newly added code, but I'm a bit concerned about existing code - there are a lot of bihash callers that use the value from bihash as vec/pool/... index as long as the search call returned 0, without any check against ~0. It would be difficult to fix them all. A fix in bihash might be more preferable, considering these existing use cases.

Best regards,
Hao Tian
________________________________________
From: vpp-dev@... <vpp-dev@...> on behalf of Andrew Yourtchenko <ayourtch@...>
Sent: Thursday, March 16, 2023 12:33 AM
To: vpp-dev@...
Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

Hao,

I noticed the same behavior when stress-testing the multi thread session handling for the ACL plugin a while ago. I thought this trade off is there to avoid having to do the hard locks in bihash code, rather than it being a bug.

As you say - the special value comes only if the deletion is in progress, and it is always the same. So I just treated that case in my code same as “not found”.

My logic was: if an entry is just in process of being deleted, there is very little use for its old value anyway.

--a

On 15 Mar 2023, at 14:45, Hao Tian <tianhao.ms@...> wrote:

Hi,

I tried but could not come up with any way that is able to ensure the kvp being valid upon return without using the full bucket lock.

Maybe we can make a copy of the value before returning, validate the copy and return that copy instead. Critical section can be shrinked to cover only the copying process, which seems to perform better, but I'm not sure if this is the best approach.

Could you please shed some light here? Thanks!

Regards,
Hao Tian


Hao Tian
 

Hi Dave,

Thanks for your work. I am ready to test whenever needed.

Best regards,
Hao Tian

________________________________________
From: vpp-dev@... <vpp-dev@...> on behalf of Dave Barach <vpp@...>
Sent: Thursday, March 16, 2023 7:02 AM
To: vpp-dev@...
Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

I'm doing a bit of work to straighten out the template, hopefully without causing a measurable performance regression.

Hao's test code is a bit of a corner-case: there is exactly one record in the database which the code thrashes as hard as possible.

D.

-----Original Message-----
From: vpp-dev@... <vpp-dev@...> On Behalf Of Andrew Yourtchenko
Sent: Wednesday, March 15, 2023 12:33 PM
To: vpp-dev@...
Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

Hao,

I noticed the same behavior when stress-testing the multi thread session handling for the ACL plugin a while ago. I thought this trade off is there to avoid having to do the hard locks in bihash code, rather than it being a bug.

As you say - the special value comes only if the deletion is in progress, and it is always the same. So I just treated that case in my code same as “not found”.

My logic was: if an entry is just in process of being deleted, there is very little use for its old value anyway.

--a

On 15 Mar 2023, at 14:45, Hao Tian <tianhao.ms@...> wrote:

Hi,

I tried but could not come up with any way that is able to ensure the kvp being valid upon return without using the full bucket lock.

Maybe we can make a copy of the value before returning, validate the copy and return that copy instead. Critical section can be shrinked to cover only the copying process, which seems to perform better, but I'm not sure if this is the best approach.

Could you please shed some light here? Thanks!

Regards,
Hao Tian


Dave Barach
 

-----Original Message-----

From: vpp-dev@... <vpp-dev@...> On Behalf Of Hao Tian

Sent: Wednesday, March 15, 2023 10:14 PM

To: vpp-dev@...

Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

 

Hi Dave,

 

Thanks for your work. I am ready to test whenever needed.

 

Best regards,

Hao Tian

 

________________________________________

From: vpp-dev@... <vpp-dev@...> on behalf of Dave Barach <vpp@...>

Sent: Thursday, March 16, 2023 7:02 AM

To: vpp-dev@...

Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

 

I'm doing a bit of work to straighten out the template, hopefully without causing a measurable performance regression.

 

Hao's test code is a bit of a corner-case: there is exactly one record in the database which the code thrashes as hard as possible.

 

D.

 

-----Original Message-----

From: vpp-dev@... <vpp-dev@...> On Behalf Of Andrew Yourtchenko

Sent: Wednesday, March 15, 2023 12:33 PM

To: vpp-dev@...

Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

 

Hao,

 

I noticed the same behavior when stress-testing the multi thread session handling for the ACL plugin a while ago. I thought this trade off is there to avoid having to do the hard locks in bihash code, rather than it being a bug.

 

As you say - the special value comes only if the deletion is in progress, and it is always the same. So I just treated that case in my code same as “not found”.

 

My logic was: if an entry is just in process of being deleted, there is very little use for its old value anyway.

 

--a

 

> On 15 Mar 2023, at 14:45, Hao Tian <tianhao.ms@...> wrote:

> Hi,

> I tried but could not come up with any way that is able to ensure the kvp being valid upon return without using the full bucket lock.

> Maybe we can make a copy of the value before returning, validate the copy and return that copy instead. Critical section can be shrinked to cover only the copying process, which seems to perform better, but I'm not sure if this is the best approach.

> Could you please shed some light here? Thanks!

> Regards,

> Hao Tian

 

 


themiron@...
 

Hi Dave,

 

What if just to use highest bit of key as free mark?

While having clib_crc32c_uses_intrinsics defined, hash result will be 32bit only, highest bit of 64bit key is completely “free”.

If not defined, - highest but can be just stripped off while computation in clib_bihash_*.

With such approach, 64bit key update will be atomic, no special (and therefore – never about to be used) value will be necessary, can it work?

 

--

Best Regards, Vladislav Grishenko


Hao Tian
 

Hi Dave,

I tested the change and found some issue. Please check gerrit comments. Thanks!

Best regards,
Hao Tian

-----Original Message-----
From: vpp-dev@... <vpp-dev@...> on behalf of Dave Barach <vpp@...>
Sent: Friday, March 17, 2023 1:50 AM
To: vpp-dev@...
Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

Please see https://gerrit.fd.io/r/c/vpp/+/38507



-----Original Message-----

From: vpp-dev@...<mailto:vpp-dev@...> <vpp-dev@...<mailto:vpp-dev@...>> On Behalf Of Hao Tian

Sent: Wednesday, March 15, 2023 10:14 PM

To: vpp-dev@...<mailto:vpp-dev@...>

Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?



Hi Dave,



Thanks for your work. I am ready to test whenever needed.



Best regards,

Hao Tian



________________________________________

From: vpp-dev@...<mailto:vpp-dev@...> <vpp-dev@...<mailto:vpp-dev@...>> on behalf of Dave Barach <vpp@...<mailto:vpp@...>>

Sent: Thursday, March 16, 2023 7:02 AM

To: vpp-dev@...<mailto:vpp-dev@...>

Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?



I'm doing a bit of work to straighten out the template, hopefully without causing a measurable performance regression.



Hao's test code is a bit of a corner-case: there is exactly one record in the database which the code thrashes as hard as possible.



D.



-----Original Message-----

From: vpp-dev@...<mailto:vpp-dev@...> <vpp-dev@...<mailto:vpp-dev@...>> On Behalf Of Andrew Yourtchenko

Sent: Wednesday, March 15, 2023 12:33 PM

To: vpp-dev@...<mailto:vpp-dev@...>

Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?



Hao,



I noticed the same behavior when stress-testing the multi thread session handling for the ACL plugin a while ago. I thought this trade off is there to avoid having to do the hard locks in bihash code, rather than it being a bug.



As you say - the special value comes only if the deletion is in progress, and it is always the same. So I just treated that case in my code same as “not found”.



My logic was: if an entry is just in process of being deleted, there is very little use for its old value anyway.



--a



On 15 Mar 2023, at 14:45, Hao Tian <tianhao.ms@...<mailto:tianhao.ms@...>> wrote:
Hi,
I tried but could not come up with any way that is able to ensure the kvp being valid upon return without using the full bucket lock.
Maybe we can make a copy of the value before returning, validate the copy and return that copy instead. Critical section can be shrinked to cover only the copying process, which seems to perform better, but I'm not sure if this is the best approach.
Could you please shed some light here? Thanks!
Regards,
Hao Tian


Hao Tian
 

Hi,

Fix confirmed. Git master branch runs the test code without any error. Thanks for everyone participating in this thread!

Best regards,
Hao Tian

-----Original Message-----
From: vpp-dev@... <vpp-dev@...> on behalf of Hao Tian <tianhao.ms@...>
Sent: Friday, March 17, 2023 11:17 AM
To: vpp-dev@...
Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

Hi Dave,

I tested the change and found some issue. Please check gerrit comments. Thanks!

Best regards,
Hao Tian

-----Original Message-----
From: vpp-dev@... <vpp-dev@...> on behalf of Dave Barach <vpp@...>
Sent: Friday, March 17, 2023 1:50 AM
To: vpp-dev@...
Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?

Please see https://gerrit.fd.io/r/c/vpp/+/38507



-----Original Message-----

From: vpp-dev@...<mailto:vpp-dev@...> <vpp-dev@...<mailto:vpp-dev@...>> On Behalf Of Hao Tian

Sent: Wednesday, March 15, 2023 10:14 PM

To: vpp-dev@...<mailto:vpp-dev@...>

Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?



Hi Dave,



Thanks for your work. I am ready to test whenever needed.



Best regards,

Hao Tian



________________________________________

From: vpp-dev@...<mailto:vpp-dev@...> <vpp-dev@...<mailto:vpp-dev@...>> on behalf of Dave Barach <vpp@...<mailto:vpp@...>>

Sent: Thursday, March 16, 2023 7:02 AM

To: vpp-dev@...<mailto:vpp-dev@...>

Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?



I'm doing a bit of work to straighten out the template, hopefully without causing a measurable performance regression.



Hao's test code is a bit of a corner-case: there is exactly one record in the database which the code thrashes as hard as possible.



D.



-----Original Message-----

From: vpp-dev@...<mailto:vpp-dev@...> <vpp-dev@...<mailto:vpp-dev@...>> On Behalf Of Andrew Yourtchenko

Sent: Wednesday, March 15, 2023 12:33 PM

To: vpp-dev@...<mailto:vpp-dev@...>

Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?



Hao,



I noticed the same behavior when stress-testing the multi thread session handling for the ACL plugin a while ago. I thought this trade off is there to avoid having to do the hard locks in bihash code, rather than it being a bug.



As you say - the special value comes only if the deletion is in progress, and it is always the same. So I just treated that case in my code same as “not found”.



My logic was: if an entry is just in process of being deleted, there is very little use for its old value anyway.



--a



On 15 Mar 2023, at 14:45, Hao Tian <tianhao.ms@...<mailto:tianhao.ms@...>> wrote:
Hi,
I tried but could not come up with any way that is able to ensure the kvp being valid upon return without using the full bucket lock.
Maybe we can make a copy of the value before returning, validate the copy and return that copy instead. Critical section can be shrinked to cover only the copying process, which seems to perform better, but I'm not sure if this is the best approach.
Could you please shed some light here? Thanks!
Regards,
Hao Tian