Quantcast
Channel: AMD Developer Forums: Message List - Global synchronization inside the kernel
Browsing all 17 articles
Browse latest View live

Re: Global synchronization inside the kernel

And suddenly: I've found the ds_gws_barrier instructions. Unfortunately I haven't found any documentation about it. If anyone knows it please tell me how it works.I gonna check it soon. What if it can...

View Article


Re: Global synchronization inside the kernel

On Windows, global sync was smooth until I did something like move the window around while the kernels are running.  I figured it was partitioning CUs between compute and rendering or something. Btw,...

View Article


Re: Global synchronization inside the kernel

gcnc.  What's that and where can we get it?

View Article

Re: Global synchronization inside the kernel

Hi drallan, Thx for the great example code! And congrats to your compiler!  But how can it fail even on a simple thing as this: (the result is a deadlock at ds_barrier :S) AMD disasm tells me that I do...

View Article

Re: Global synchronization inside the kernel

40 waves because on a HD7770 that is the total number of SIMD units. (1{ShaderEngines}*2{ShaderArrayElements}*5{CUes/ShaderArrayElements}*4{SIMDes/CUes} this is how identify them with the HW_ID...

View Article


Re: Global synchronization inside the kernel

realhet wrote:But how can it fail even on a simple thing as this: (the result is a deadlock at ds_barrier :S)   s_mov_b64 exec,1                   //restrict to first local id  s_cmpk_eq_i32 s2,0...

View Article

Image may be NSFW.
Clik here to view.

Re: Global synchronization inside the kernel

Finally it works, thank you  Finding the first thread was only one mistake I've made.There was a stupid mistype: I typed 'ossfet' instead of 'offset' in one of the macros lol, and my asm just simply...

View Article

Re: Global synchronization inside the kernel

Yep, but that would give you only 10% occupancy which could be slow.  But if it's just a test that doesn't care about performance then it doesn't matter.

View Article


Re: Global synchronization inside the kernel

Please note that these small numbers of waves are for the smallest GCN chip, which has only 10 CUes, not 32.With 40 threads it is possible to utilize all the 640 streams but without any latency hiding...

View Article


Image may be NSFW.
Clik here to view.

Re: Global synchronization inside the kernel

With 60 threads (thanks for ds_gws_barrier) it was possible to put 6 waves into every CU, and this tolerates better the 'fat' instruction stream I'm planning to give them. Thanks for the data. Agree,...

View Article

Re: Global synchronization inside the kernel

vmiura wrote: gcnc.  What's that and where can we get it?  Hi vmiura, The best answer is it's my attempt at building a GCN hardware specific C compiler/assembler that can run in AMD's opencl...

View Article

Re: Global synchronization inside the kernel

This new Jive platform finally works under IE10, but I for one cannot edit my messages, because it keeps importing my very first post of the topic, and I fear editing it, because I think it will edit...

View Article

Re: Global synchronization inside the kernel

Very inspirational post! How good is to have arithmetic expressions and local functions with inline asm. Makes me wanna throw away macros and start to make something out of my pascal parser. Now at...

View Article


Image may be NSFW.
Clik here to view.

Re: Re: Global synchronization inside the kernel

Oups I had a mistake: forgot to use GLC while checking the synchronization with uav.So the 8 wavefronts / CU is possible with GWS, and beyond this it is a crash. w/CU      4   5   6   7   8MAD      29...

View Article

Re: Re: Global synchronization inside the kernel

Oups I had a mistake: forgot to use GLC while checking the synchronization with uav.So the 8 wavefronts / CU is possible with GWS, and beyond this it is a crash. w/CU      4   5   6   7   8MAD      29...

View Article


Image may be NSFW.
Clik here to view.

Re: Global synchronization inside the kernel

Wow thanks for MAC, now I'm at 960 GFlops/s with 230KHz synch I do convolution most of the time, so that's the proper instruction.(Gotta memorize that mad = mad+mac+madak+madmk. Even in my Mandelbrot...

View Article

Image may be NSFW.
Clik here to view.

Re: Global synchronization inside the kernel

Here's how a 10cu HD7770 'instrument' sounds in realtime https://soundcloud.com/realhet/gcn-piano-moonlight-mvt3-by (performed by vs120 on prog.hu) And I don't even use the synch yet, all strings are...

View Article

Browsing all 17 articles
Browse latest View live