Re: Global synchronization inside the kernel
And suddenly: I've found the ds_gws_barrier instructions. Unfortunately I haven't found any documentation about it. If anyone knows it please tell me how it works.I gonna check it soon. What if it can...
View ArticleRe: Global synchronization inside the kernel
On Windows, global sync was smooth until I did something like move the window around while the kernels are running. I figured it was partitioning CUs between compute and rendering or something. Btw,...
View ArticleRe: Global synchronization inside the kernel
gcnc. What's that and where can we get it?
View ArticleRe: Global synchronization inside the kernel
Hi drallan, Thx for the great example code! And congrats to your compiler! But how can it fail even on a simple thing as this: (the result is a deadlock at ds_barrier :S) AMD disasm tells me that I do...
View ArticleRe: Global synchronization inside the kernel
40 waves because on a HD7770 that is the total number of SIMD units. (1{ShaderEngines}*2{ShaderArrayElements}*5{CUes/ShaderArrayElements}*4{SIMDes/CUes} this is how identify them with the HW_ID...
View ArticleRe: Global synchronization inside the kernel
realhet wrote:But how can it fail even on a simple thing as this: (the result is a deadlock at ds_barrier :S) s_mov_b64 exec,1 //restrict to first local id s_cmpk_eq_i32 s2,0...
View ArticleRe: Global synchronization inside the kernel
Finally it works, thank you Finding the first thread was only one mistake I've made.There was a stupid mistype: I typed 'ossfet' instead of 'offset' in one of the macros lol, and my asm just simply...
View ArticleRe: Global synchronization inside the kernel
Yep, but that would give you only 10% occupancy which could be slow. But if it's just a test that doesn't care about performance then it doesn't matter.
View ArticleRe: Global synchronization inside the kernel
Please note that these small numbers of waves are for the smallest GCN chip, which has only 10 CUes, not 32.With 40 threads it is possible to utilize all the 640 streams but without any latency hiding...
View ArticleRe: Global synchronization inside the kernel
With 60 threads (thanks for ds_gws_barrier) it was possible to put 6 waves into every CU, and this tolerates better the 'fat' instruction stream I'm planning to give them. Thanks for the data. Agree,...
View ArticleRe: Global synchronization inside the kernel
vmiura wrote: gcnc. What's that and where can we get it? Hi vmiura, The best answer is it's my attempt at building a GCN hardware specific C compiler/assembler that can run in AMD's opencl...
View ArticleRe: Global synchronization inside the kernel
This new Jive platform finally works under IE10, but I for one cannot edit my messages, because it keeps importing my very first post of the topic, and I fear editing it, because I think it will edit...
View ArticleRe: Global synchronization inside the kernel
Very inspirational post! How good is to have arithmetic expressions and local functions with inline asm. Makes me wanna throw away macros and start to make something out of my pascal parser. Now at...
View ArticleRe: Re: Global synchronization inside the kernel
Oups I had a mistake: forgot to use GLC while checking the synchronization with uav.So the 8 wavefronts / CU is possible with GWS, and beyond this it is a crash. w/CU 4 5 6 7 8MAD 29...
View ArticleRe: Re: Global synchronization inside the kernel
Oups I had a mistake: forgot to use GLC while checking the synchronization with uav.So the 8 wavefronts / CU is possible with GWS, and beyond this it is a crash. w/CU 4 5 6 7 8MAD 29...
View ArticleRe: Global synchronization inside the kernel
Wow thanks for MAC, now I'm at 960 GFlops/s with 230KHz synch I do convolution most of the time, so that's the proper instruction.(Gotta memorize that mad = mad+mac+madak+madmk. Even in my Mandelbrot...
View ArticleRe: Global synchronization inside the kernel
Here's how a 10cu HD7770 'instrument' sounds in realtime https://soundcloud.com/realhet/gcn-piano-moonlight-mvt3-by (performed by vs120 on prog.hu) And I don't even use the synch yet, all strings are...
View Article