Hi drallan,
Thx for the great example code! And congrats to your compiler!
But how can it fail even on a simple thing as this: (the result is a deadlock at ds_barrier :S)
AMD disasm tells me that I do the ds_gws encodings correctly. I restrict the whole kernel to the first local lane. The workgroupsize is 64, there are 2 workgroups only and yet it goes into an infinite loop :S
Is there something in the CAL Note Section to enable it?
I've found something called IMM_GWS_BASE // immediate UINT with GWS resource base offset. It's in a _E_SC_USER_DATA_CLASS structure. Is that the key? (Right now I don't fiddle with it because I allways ask the current OpenCL to make me a fresh skeleton kernel)
----------------------------------------------------------------------------------------
var dev:=cl.devices[1], kernel:=dev.NewKernel(asm_isa(
isa79xx
numVgprs 256 numSgprs 104
numThreadPerGroup 64 //workgroupsize=64
oclBuffers 0,0
s_mov_b64 exec,1 //restrict to first local id
s_cmpk_eq_i32 s2,0 //gid=0?
s_cbranch_scc0 @skip
v_mov_b32 v10,1 //I load 1 because there are 2 waves in total
ds_gws_init v10 offset0:1 gds
s_waitcnt lgkmcnt(0)
@skip:
__for__(i:=0 to 999, s_sleep 7) //very long dummy code
ds_gws_barrier v0 offset0:1 gds //v0 is only a dummy 0
s_endpgm
));
writeln(kernel.ISACode);
with kernel.run(64*2 {2 waves}) do begin
waitfor; writeln('elapsed: '&format('%.3f',elapsedtime_sec*1000)&' ms'); free; end;
kernel.free;
---------------------------------------------------------------------------------------
ShaderType = IL_SHADER_COMPUTE
TargetChip = t;
------------- SC_SRCSHADER Dump ------------------
SC_SHADERSTATE: u32NumIntVSConst = 0
SC_SHADERSTATE: u32NumIntPSConst = 0
SC_SHADERSTATE: u32NumIntGSConst = 0
SC_SHADERSTATE: u32NumBoolVSConst = 0
SC_SHADERSTATE: u32NumBoolPSConst = 0
SC_SHADERSTATE: u32NumBoolGSConst = 0
SC_SHADERSTATE: u32NumFloatVSConst = 0
SC_SHADERSTATE: u32NumFloatPSConst = 0
SC_SHADERSTATE: u32NumFloatGSConst = 0
fConstantsAvailable = 0
iConstantsAvailable = 0
bConstantsAvailable = 0
u32SCOptions[0] = 0x01A00000 SCOption_IGNORE_SAMPLE_L_BUG SCOption_FLOAT_DO_NOT_DIST SCOption_FLOAT_DO_NOT_REASSOC
u32SCOptions[1] = 0x00000000
u32SCOptions[2] = 0x20800001 SCOption_R800_UAV_NONARRAY_FIXUP SCOption_R1000_BYTE_SHORT_WRITE_WORKAROUND_BUG317611 SCOption_R1000_READLANE_SMRD_WORKAROUND_BUG343479
u32SCOptions[3] = 0x00000010 SCOption_R1000_BARRIER_WORKAROUND_BUG405404
; -------- Disassembly --------------------
shader main
asic(SI_ASIC)
type(CS)
s_mov_b64 exec, 1 // 00000000: BEFE0481
s_cmpk_eq_i32 s2, 0x0000 // 00000004: B1820000
s_cbranch_scc0 label_0007 // 00000008: BF840004
v_mov_b32 v10, 1 // 0000000C: 7E140281
ds_gws_init v10 offset:1 gds // 00000010: D8660001 0000000A
s_waitcnt lgkmcnt(0) // 00000018: BF8C007F
label_0007:
[tonns of] s_sleep 0x0007 // 00000FA0: BF8E0007
ds_gws_barrier v0 offset:1 gds // 00000FBC: D8760001 00000000
s_endpgm // 00000FC4: BF810000
end
; ----------------- CS Data ------------------------
codeLenInByte = 4040; Bytes
userElementCount = 0;
extUserElementCount = 0;
NumVgprs = 256;
NumSgprs = 104;
FloatMode = 192;
IeeeMode = 0;
ScratchSize = 0;
texResourceUsage[0] = 0x00000000;
texResourceUsage[1] = 0x00000000
... all zeroes
fetch4ResourceUsage[7] = 0x00000000
texSamplerUsage = 0x00000000;
constBufUsage = 0x00000000;
COMPUTE_PGM_RSRC2 = 0x00000084
COMPUTE_PGM_RSRC2:USER_SGPR = 2
COMPUTE_PGM_RSRC2:TGID_X_EN = 1