realhet wrote:
But how can it fail even on a simple thing as this: (the result is a deadlock at ds_barrier :S)
s_mov_b64 exec,1 //restrict to first local id
s_cmpk_eq_i32 s2,0 //gid=0?
s_cbranch_scc0 @skip
v_mov_b32 v10,1 //I load 1 because there are 2 waves in total
ds_gws_init v10 offset0:1 gds
s_waitcnt lgkmcnt(0)
@skip [sleep a lot]
ds_gws_barrier v0 offset0:1 gds //v0 is only a dummy 0
Because gid 0 always initializes the barrier. (I have done this sooooo many times...)
What happens if wave 1 arrives before wave 0 and hits the barrier? dead!
As is the code hangs my card but runs fine when I use the first arriving wave to initialize the barrier.
ret=atomic_inc(&p[0],999); //global var set to 0, first wave gets ret=0
execsave=exec;
exec=1UL;
// if(gid==0)gws_init(1,1); //fails on gid = 0
if(ret==0)gws_init(1,1); // works, first wave has ret==0
asm("s_sleep 7");
gws_barrier(1);
exec=execsave;
I think of this as, who syncs the synchronizer?