Locked Race condition between bihash deletion and searching - misuse or bug?
Hao Tian
Hi all,
I found that bihash might return zero'ed value (0xff's) if deletion and searching were performed in parallel on different threads. This is reproducible by only ~100 lines of code, from 21.06 all the way up to git master, and on multiple machines from Coffee Lake to Tiger Lake. The code (a plugin) and test command is shown below. Given how easy it is to reproduce the problem, I'm not sure whether this is a bug or something missing on my end. Any advice is welcomed. Thanks! ======== bihash_race.c ======== #include <vnet/plugin/plugin.h> #include <vpp/app/version.h> #include <vppinfra/bihash_8_8.h> static volatile int br_run; static clib_bihash_8_8_t br_bihash_8_8; static int test_run_cnt[2]; static void bihash_race_set (u64 key) { clib_bihash_kv_8_8_t kv = { .key = key, .value = 1 }; clib_bihash_add_del_8_8 (&br_bihash_8_8, &kv, BIHASH_ADD); } static void bihash_race_del (u64 key) { clib_bihash_kv_8_8_t kv = { .key = key, .value = 1 }; clib_bihash_add_del_8_8 (&br_bihash_8_8, &kv, BIHASH_DEL); } static void bihash_race_get (u64 key) { clib_bihash_kv_8_8_t kv = { .key = key, .value = 1 }; clib_bihash_kv_8_8_t value; if (!clib_bihash_search_8_8 (&br_bihash_8_8, &kv, &value)) { if (value.value != 1) { clib_warning ("suspicious value: 0x%lx\n", value.value); } } } static clib_error_t * bihash_race_init (vlib_main_t *vm) { clib_bihash_init_8_8 (&br_bihash_8_8, "bihash_race hash", 4096, 4096 * 256); return 0; } VLIB_NODE_FN (br_test_node) (vlib_main_t *vm, vlib_node_runtime_t *node, vlib_frame_t *frame) { u64 k = 0x01010101UL; if (!br_run || vm->thread_index == 0) { return 0; } // perform bihash set and delete for 50000 times each in thread 1 if (vm->thread_index == 1 && test_run_cnt[0] < 100000) { if (test_run_cnt[1] & 1) bihash_race_del (k); else bihash_race_set (k); test_run_cnt[0]++; return 0; } // perform bihash lookup for 100000 times in thread 2 if (vm->thread_index == 2 && test_run_cnt[1] < 100000) { bihash_race_get (k); test_run_cnt[1]++; return 0; } return 0; } void bihash_race_test_reset () { memset (test_run_cnt, 0, sizeof (test_run_cnt)); clib_warning ("========== test reset =========="); } static clib_error_t * bihash_race_test_command_fn (vlib_main_t *vm, unformat_input_t *input, vlib_cli_command_t *cmd) { u8 state = ~0; while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT) { if (unformat (input, "on")) state = 1; else if (unformat (input, "off")) state = 0; else return clib_error_return (0, "invalid input"); } if (state) { br_run = 1; } else { br_run = 0; bihash_race_test_reset (); vlib_cli_output (vm, "========== test reset =========="); } return 0; } VLIB_CLI_COMMAND (bihash_race_test_command, static) = { .path = "set bihash-race-test", .short_help = "set bihash-race-test <on>|<off>", .function = bihash_race_test_command_fn, }; VLIB_REGISTER_NODE (br_test_node) = { .name = "bihash-race-test", .type = VLIB_NODE_TYPE_INPUT, .state = VLIB_NODE_STATE_POLLING, }; VLIB_INIT_FUNCTION (bihash_race_init); VLIB_PLUGIN_REGISTER () = { .version = VPP_BUILD_VER, .description = "bihash race test", }; ========= CMakeLists.txt ========= add_vpp_plugin(bihash_race SOURCES bihash_race.c ) ========= startup.conf ========= unix { nodaemon full-coredump cli-listen /run/vpp/cli.sock gid 1000 interactive } cpu { workers 3 main-core 1 } dpdk { no-pci } ========= test command ========= terminal1 # vpp -c startup.conf terminal2 # while true; do vppctl set bihash-race-test on; sleep 1; vppctl set bihash-race-test off; done VPP will spit out a lot of "suspicious value: 0xffffffffffffffff" in terminal1, while the code above never saves such value into bihash - the value comes from the !is_add branch in clib_bihash_add_del_inline_with_hash. Changing the memset value (and clib_bihash_is_free_* obviously) in this branch will lead to this new value being returned. Regards, Hao Tian |
|
Dave Barach
Quick experiment: in
toggle quoted message
Show quoted text
src/vppinfra/bihash_template.h:clib_bihash_search_inline_2_with_hash(), replace this: if (PREDICT_FALSE (b->lock)) { volatile BVT (clib_bihash_bucket) * bv = b; while (bv->lock) CLIB_PAUSE (); } With: BV(clib_bihash_lock_bucket(b)); and make sure to BV(clib_bihash_unlock_bucket(b)); just prior to return 0 and return -1 in that function. Please let me know what happens. Thanks... Dave -----Original Message-----
From: vpp-dev@... <vpp-dev@...> On Behalf Of Hao Tian Sent: Tuesday, March 14, 2023 4:13 AM To: vpp-dev@... Subject: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug? Hi all, I found that bihash might return zero'ed value (0xff's) if deletion and searching were performed in parallel on different threads. This is reproducible by only ~100 lines of code, from 21.06 all the way up to git master, and on multiple machines from Coffee Lake to Tiger Lake. The code (a plugin) and test command is shown below. Given how easy it is to reproduce the problem, I'm not sure whether this is a bug or something missing on my end. Any advice is welcomed. Thanks! ======== bihash_race.c ======== #include <vnet/plugin/plugin.h> #include <vpp/app/version.h> #include <vppinfra/bihash_8_8.h> static volatile int br_run; static clib_bihash_8_8_t br_bihash_8_8; static int test_run_cnt[2]; static void bihash_race_set (u64 key) { clib_bihash_kv_8_8_t kv = { .key = key, .value = 1 }; clib_bihash_add_del_8_8 (&br_bihash_8_8, &kv, BIHASH_ADD); } static void bihash_race_del (u64 key) { clib_bihash_kv_8_8_t kv = { .key = key, .value = 1 }; clib_bihash_add_del_8_8 (&br_bihash_8_8, &kv, BIHASH_DEL); } static void bihash_race_get (u64 key) { clib_bihash_kv_8_8_t kv = { .key = key, .value = 1 }; clib_bihash_kv_8_8_t value; if (!clib_bihash_search_8_8 (&br_bihash_8_8, &kv, &value)) { if (value.value != 1) { clib_warning ("suspicious value: 0x%lx\n", value.value); } } } static clib_error_t * bihash_race_init (vlib_main_t *vm) { clib_bihash_init_8_8 (&br_bihash_8_8, "bihash_race hash", 4096, 4096 * 256); return 0; } VLIB_NODE_FN (br_test_node) (vlib_main_t *vm, vlib_node_runtime_t *node, vlib_frame_t *frame) { u64 k = 0x01010101UL; if (!br_run || vm->thread_index == 0) { return 0; } // perform bihash set and delete for 50000 times each in thread 1 if (vm->thread_index == 1 && test_run_cnt[0] < 100000) { if (test_run_cnt[1] & 1) bihash_race_del (k); else bihash_race_set (k); test_run_cnt[0]++; return 0; } // perform bihash lookup for 100000 times in thread 2 if (vm->thread_index == 2 && test_run_cnt[1] < 100000) { bihash_race_get (k); test_run_cnt[1]++; return 0; } return 0; } void bihash_race_test_reset () { memset (test_run_cnt, 0, sizeof (test_run_cnt)); clib_warning ("========== test reset =========="); } static clib_error_t * bihash_race_test_command_fn (vlib_main_t *vm, unformat_input_t *input, vlib_cli_command_t *cmd) { u8 state = ~0; while (unformat_check_input (input) != UNFORMAT_END_OF_INPUT) { if (unformat (input, "on")) state = 1; else if (unformat (input, "off")) state = 0; else return clib_error_return (0, "invalid input"); } if (state) { br_run = 1; } else { br_run = 0; bihash_race_test_reset (); vlib_cli_output (vm, "========== test reset =========="); } return 0; } VLIB_CLI_COMMAND (bihash_race_test_command, static) = { .path = "set bihash-race-test", .short_help = "set bihash-race-test <on>|<off>", .function = bihash_race_test_command_fn, }; VLIB_REGISTER_NODE (br_test_node) = { .name = "bihash-race-test", .type = VLIB_NODE_TYPE_INPUT, .state = VLIB_NODE_STATE_POLLING, }; VLIB_INIT_FUNCTION (bihash_race_init); VLIB_PLUGIN_REGISTER () = { .version = VPP_BUILD_VER, .description = "bihash race test", }; ========= CMakeLists.txt ========= add_vpp_plugin(bihash_race SOURCES bihash_race.c ) ========= startup.conf ========= unix { nodaemon full-coredump cli-listen /run/vpp/cli.sock gid 1000 interactive } cpu { workers 3 main-core 1 } dpdk { no-pci } ========= test command ========= terminal1 # vpp -c startup.conf terminal2 # while true; do vppctl set bihash-race-test on; sleep 1; vppctl set bihash-race-test off; done VPP will spit out a lot of "suspicious value: 0xffffffffffffffff" in terminal1, while the code above never saves such value into bihash - the value comes from the !is_add branch in clib_bihash_add_del_inline_with_hash. Changing the memset value (and clib_bihash_is_free_* obviously) in this branch will lead to this new value being returned. Regards, Hao Tian |
|
Hao Tian
Hi,
I've done the change you asked, and the "suspicious value" warnings are all gone. Here is the diff: diff --git a/src/vppinfra/bihash_template.h b/src/vppinfra/bihash_template.h index c4e120e4a..b9f658db3 100644 --- a/src/vppinfra/bihash_template.h +++ b/src/vppinfra/bihash_template.h @@ -532,12 +532,7 @@ static inline int BV (clib_bihash_search_inline_2_with_hash) if (PREDICT_FALSE (BV (clib_bihash_bucket_is_empty) (b))) return -1; - if (PREDICT_FALSE (b->lock)) - { - volatile BVT (clib_bihash_bucket) * bv = b; - while (bv->lock) - CLIB_PAUSE (); - } + BV (clib_bihash_lock_bucket) (b); v = BV (clib_bihash_get_value) (h, b->offset); @@ -557,9 +552,11 @@ static inline int BV (clib_bihash_search_inline_2_with_hash) if (BV (clib_bihash_key_compare) (v->kvp[i].key, search_key->key)) { *valuep = v->kvp[i]; + BV (clib_bihash_unlock_bucket) (b); return 0; } } + BV (clib_bihash_unlock_bucket) (b); return -1; } Some questions regarding the change: 1. There is similar logic in clib_bihash_search_inline_with_hash. I suppose that this change needs to be applied there as well. Am I right? 2. This patch does eliminate the race condition, but appears to introduce a huge performance regression. The clocks in `show runtime` will double after the change (with the 100000-ops limit removed, so the test node runs indefinitely). Regards, Hao Tian |
|
Dave Barach
Doubling the number of atomic ops was expected to cause a severe performance
toggle quoted message
Show quoted text
regression. This hack demonstrates that the test code wasn't responsible. A bit of extra suspicion in clib_bihash_search_inline_2_with_hash() will likely solve the problem without the huge performance regression. D. -----Original Message-----
From: vpp-dev@... <vpp-dev@...> On Behalf Of Hao Tian Sent: Tuesday, March 14, 2023 9:47 PM To: vpp-dev@... Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug? Hi, I've done the change you asked, and the "suspicious value" warnings are all gone. Here is the diff: diff --git a/src/vppinfra/bihash_template.h b/src/vppinfra/bihash_template.h index c4e120e4a..b9f658db3 100644 --- a/src/vppinfra/bihash_template.h +++ b/src/vppinfra/bihash_template.h @@ -532,12 +532,7 @@ static inline int BV (clib_bihash_search_inline_2_with_hash) if (PREDICT_FALSE (BV (clib_bihash_bucket_is_empty) (b))) return -1; - if (PREDICT_FALSE (b->lock)) - { - volatile BVT (clib_bihash_bucket) * bv = b; - while (bv->lock) - CLIB_PAUSE (); - } + BV (clib_bihash_lock_bucket) (b); v = BV (clib_bihash_get_value) (h, b->offset); @@ -557,9 +552,11 @@ static inline int BV (clib_bihash_search_inline_2_with_hash) if (BV (clib_bihash_key_compare) (v->kvp[i].key, search_key->key)) { *valuep = v->kvp[i]; + BV (clib_bihash_unlock_bucket) (b); return 0; } } + BV (clib_bihash_unlock_bucket) (b); return -1; } Some questions regarding the change: 1. There is similar logic in clib_bihash_search_inline_with_hash. I suppose that this change needs to be applied there as well. Am I right? 2. This patch does eliminate the race condition, but appears to introduce a huge performance regression. The clocks in `show runtime` will double after the change (with the 100000-ops limit removed, so the test node runs indefinitely). Regards, Hao Tian |
|
Hao Tian
Hi,
I tried but could not come up with any way that is able to ensure the kvp being valid upon return without using the full bucket lock. Maybe we can make a copy of the value before returning, validate the copy and return that copy instead. Critical section can be shrinked to cover only the copying process, which seems to perform better, but I'm not sure if this is the best approach. Could you please shed some light here? Thanks! Regards, Hao Tian |
|
Andrew Yourtchenko
Hao,
toggle quoted message
Show quoted text
I noticed the same behavior when stress-testing the multi thread session handling for the ACL plugin a while ago. I thought this trade off is there to avoid having to do the hard locks in bihash code, rather than it being a bug. As you say - the special value comes only if the deletion is in progress, and it is always the same. So I just treated that case in my code same as “not found”. My logic was: if an entry is just in process of being deleted, there is very little use for its old value anyway. --a On 15 Mar 2023, at 14:45, Hao Tian <tianhao.ms@...> wrote: |
|
Dave Barach
I'm doing a bit of work to straighten out the template, hopefully without causing a measurable performance regression.
toggle quoted message
Show quoted text
Hao's test code is a bit of a corner-case: there is exactly one record in the database which the code thrashes as hard as possible. D. -----Original Message-----
From: vpp-dev@... <vpp-dev@...> On Behalf Of Andrew Yourtchenko Sent: Wednesday, March 15, 2023 12:33 PM To: vpp-dev@... Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug? Hao, I noticed the same behavior when stress-testing the multi thread session handling for the ACL plugin a while ago. I thought this trade off is there to avoid having to do the hard locks in bihash code, rather than it being a bug. As you say - the special value comes only if the deletion is in progress, and it is always the same. So I just treated that case in my code same as “not found”. My logic was: if an entry is just in process of being deleted, there is very little use for its old value anyway. --a On 15 Mar 2023, at 14:45, Hao Tian <tianhao.ms@...> wrote: |
|
Hao Tian
Hi Andrew,
Thanks for the insight, the idea about treating it as optimization is quite interesting. This is fine for newly added code, but I'm a bit concerned about existing code - there are a lot of bihash callers that use the value from bihash as vec/pool/... index as long as the search call returned 0, without any check against ~0. It would be difficult to fix them all. A fix in bihash might be more preferable, considering these existing use cases. Best regards, Hao Tian ________________________________________ From: vpp-dev@... <vpp-dev@...> on behalf of Andrew Yourtchenko <ayourtch@...> Sent: Thursday, March 16, 2023 12:33 AM To: vpp-dev@... Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug? Hao, I noticed the same behavior when stress-testing the multi thread session handling for the ACL plugin a while ago. I thought this trade off is there to avoid having to do the hard locks in bihash code, rather than it being a bug. As you say - the special value comes only if the deletion is in progress, and it is always the same. So I just treated that case in my code same as “not found”. My logic was: if an entry is just in process of being deleted, there is very little use for its old value anyway. --a On 15 Mar 2023, at 14:45, Hao Tian <tianhao.ms@...> wrote: |
|
Hao Tian
Hi Dave,
Thanks for your work. I am ready to test whenever needed. Best regards, Hao Tian ________________________________________ From: vpp-dev@... <vpp-dev@...> on behalf of Dave Barach <vpp@...> Sent: Thursday, March 16, 2023 7:02 AM To: vpp-dev@... Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug? I'm doing a bit of work to straighten out the template, hopefully without causing a measurable performance regression. Hao's test code is a bit of a corner-case: there is exactly one record in the database which the code thrashes as hard as possible. D. -----Original Message----- From: vpp-dev@... <vpp-dev@...> On Behalf Of Andrew Yourtchenko Sent: Wednesday, March 15, 2023 12:33 PM To: vpp-dev@... Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug? Hao, I noticed the same behavior when stress-testing the multi thread session handling for the ACL plugin a while ago. I thought this trade off is there to avoid having to do the hard locks in bihash code, rather than it being a bug. As you say - the special value comes only if the deletion is in progress, and it is always the same. So I just treated that case in my code same as “not found”. My logic was: if an entry is just in process of being deleted, there is very little use for its old value anyway. --a On 15 Mar 2023, at 14:45, Hao Tian <tianhao.ms@...> wrote: |
|
Dave Barach
Please see https://gerrit.fd.io/r/c/vpp/+/38507
-----Original Message----- From: vpp-dev@... <vpp-dev@...> On Behalf Of Hao Tian Sent: Wednesday, March 15, 2023 10:14 PM To: vpp-dev@... Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?
Hi Dave,
Thanks for your work. I am ready to test whenever needed.
Best regards, Hao Tian
________________________________________ From: vpp-dev@... <vpp-dev@...> on behalf of Dave Barach <vpp@...> Sent: Thursday, March 16, 2023 7:02 AM To: vpp-dev@... Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?
I'm doing a bit of work to straighten out the template, hopefully without causing a measurable performance regression.
Hao's test code is a bit of a corner-case: there is exactly one record in the database which the code thrashes as hard as possible.
D.
-----Original Message----- From: vpp-dev@... <vpp-dev@...> On Behalf Of Andrew Yourtchenko Sent: Wednesday, March 15, 2023 12:33 PM To: vpp-dev@... Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug?
Hao,
I noticed the same behavior when stress-testing the multi thread session handling for the ACL plugin a while ago. I thought this trade off is there to avoid having to do the hard locks in bihash code, rather than it being a bug.
As you say - the special value comes only if the deletion is in progress, and it is always the same. So I just treated that case in my code same as “not found”.
My logic was: if an entry is just in process of being deleted, there is very little use for its old value anyway.
--a
> On 15 Mar 2023, at 14:45, Hao Tian <tianhao.ms@...> wrote: > > Hi, > > I tried but could not come up with any way that is able to ensure the kvp being valid upon return without using the full bucket lock. > > Maybe we can make a copy of the value before returning, validate the copy and return that copy instead. Critical section can be shrinked to cover only the copying process, which seems to perform better, but I'm not sure if this is the best approach. > > Could you please shed some light here? Thanks! > > Regards, > Hao Tian >
|
|
themiron@...
Hi Dave,
What if just to use highest bit of key as free mark? While having clib_crc32c_uses_intrinsics defined, hash result will be 32bit only, highest bit of 64bit key is completely “free”. If not defined, - highest but can be just stripped off while computation in clib_bihash_*. With such approach, 64bit key update will be atomic, no special (and therefore – never about to be used) value will be necessary, can it work?
-- Best Regards, Vladislav Grishenko |
|
Hao Tian
Hi Dave,
toggle quoted message
Show quoted text
I tested the change and found some issue. Please check gerrit comments. Thanks! Best regards, Hao Tian -----Original Message-----
From: vpp-dev@... <vpp-dev@...> on behalf of Dave Barach <vpp@...> Sent: Friday, March 17, 2023 1:50 AM To: vpp-dev@... Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug? Please see https://gerrit.fd.io/r/c/vpp/+/38507 -----Original Message----- From: vpp-dev@...<mailto:vpp-dev@...> <vpp-dev@...<mailto:vpp-dev@...>> On Behalf Of Hao Tian Sent: Wednesday, March 15, 2023 10:14 PM To: vpp-dev@...<mailto:vpp-dev@...> Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug? Hi Dave, Thanks for your work. I am ready to test whenever needed. Best regards, Hao Tian ________________________________________ From: vpp-dev@...<mailto:vpp-dev@...> <vpp-dev@...<mailto:vpp-dev@...>> on behalf of Dave Barach <vpp@...<mailto:vpp@...>> Sent: Thursday, March 16, 2023 7:02 AM To: vpp-dev@...<mailto:vpp-dev@...> Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug? I'm doing a bit of work to straighten out the template, hopefully without causing a measurable performance regression. Hao's test code is a bit of a corner-case: there is exactly one record in the database which the code thrashes as hard as possible. D. -----Original Message----- From: vpp-dev@...<mailto:vpp-dev@...> <vpp-dev@...<mailto:vpp-dev@...>> On Behalf Of Andrew Yourtchenko Sent: Wednesday, March 15, 2023 12:33 PM To: vpp-dev@...<mailto:vpp-dev@...> Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug? Hao, I noticed the same behavior when stress-testing the multi thread session handling for the ACL plugin a while ago. I thought this trade off is there to avoid having to do the hard locks in bihash code, rather than it being a bug. As you say - the special value comes only if the deletion is in progress, and it is always the same. So I just treated that case in my code same as “not found”. My logic was: if an entry is just in process of being deleted, there is very little use for its old value anyway. --a On 15 Mar 2023, at 14:45, Hao Tian <tianhao.ms@...<mailto:tianhao.ms@...>> wrote: Hi, I tried but could not come up with any way that is able to ensure the kvp being valid upon return without using the full bucket lock. Maybe we can make a copy of the value before returning, validate the copy and return that copy instead. Critical section can be shrinked to cover only the copying process, which seems to perform better, but I'm not sure if this is the best approach. Could you please shed some light here? Thanks! Regards, Hao Tian |
|
Hao Tian
Hi,
toggle quoted message
Show quoted text
Fix confirmed. Git master branch runs the test code without any error. Thanks for everyone participating in this thread! Best regards, Hao Tian -----Original Message-----
From: vpp-dev@... <vpp-dev@...> on behalf of Hao Tian <tianhao.ms@...> Sent: Friday, March 17, 2023 11:17 AM To: vpp-dev@... Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug? Hi Dave, I tested the change and found some issue. Please check gerrit comments. Thanks! Best regards, Hao Tian -----Original Message----- From: vpp-dev@... <vpp-dev@...> on behalf of Dave Barach <vpp@...> Sent: Friday, March 17, 2023 1:50 AM To: vpp-dev@... Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug? Please see https://gerrit.fd.io/r/c/vpp/+/38507 -----Original Message----- From: vpp-dev@...<mailto:vpp-dev@...> <vpp-dev@...<mailto:vpp-dev@...>> On Behalf Of Hao Tian Sent: Wednesday, March 15, 2023 10:14 PM To: vpp-dev@...<mailto:vpp-dev@...> Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug? Hi Dave, Thanks for your work. I am ready to test whenever needed. Best regards, Hao Tian ________________________________________ From: vpp-dev@...<mailto:vpp-dev@...> <vpp-dev@...<mailto:vpp-dev@...>> on behalf of Dave Barach <vpp@...<mailto:vpp@...>> Sent: Thursday, March 16, 2023 7:02 AM To: vpp-dev@...<mailto:vpp-dev@...> Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug? I'm doing a bit of work to straighten out the template, hopefully without causing a measurable performance regression. Hao's test code is a bit of a corner-case: there is exactly one record in the database which the code thrashes as hard as possible. D. -----Original Message----- From: vpp-dev@...<mailto:vpp-dev@...> <vpp-dev@...<mailto:vpp-dev@...>> On Behalf Of Andrew Yourtchenko Sent: Wednesday, March 15, 2023 12:33 PM To: vpp-dev@...<mailto:vpp-dev@...> Subject: Re: [vpp-dev] Race condition between bihash deletion and searching - misuse or bug? Hao, I noticed the same behavior when stress-testing the multi thread session handling for the ACL plugin a while ago. I thought this trade off is there to avoid having to do the hard locks in bihash code, rather than it being a bug. As you say - the special value comes only if the deletion is in progress, and it is always the same. So I just treated that case in my code same as “not found”. My logic was: if an entry is just in process of being deleted, there is very little use for its old value anyway. --a On 15 Mar 2023, at 14:45, Hao Tian <tianhao.ms@...<mailto:tianhao.ms@...>> wrote: Hi, I tried but could not come up with any way that is able to ensure the kvp being valid upon return without using the full bucket lock. Maybe we can make a copy of the value before returning, validate the copy and return that copy instead. Critical section can be shrinked to cover only the copying process, which seems to perform better, but I'm not sure if this is the best approach. Could you please shed some light here? Thanks! Regards, Hao Tian |
|