Quantcast
Channel: AMD Developer Forums: Message List - Global synchronization inside the kernel
Viewing all articles
Browse latest Browse all 17

Re: Re: Global synchronization inside the kernel

$
0
0

Oups I had a mistake: forgot to use GLC while checking the synchronization with uav.

So the 8 wavefronts / CU is possible with GWS, and beyond this it is a crash.

 

w/CU      4   5   6   7   8

MAD      29  37  38  39  39    (exec time, ms)

ADD      21  34  34  34  34

 

When I raised it from 6 to 8, the exec time was only increased 1ms from 38, so some sleeping units was awaken.

Not the TFlops/s I can get out of it it is 838 (raised from 700, peak is 1126).

(And this leads to a problem in the piano: Faster processing leads to less string lengths given to each of the wavefronts. And it starting to reach lengths of the bass strings. It will be a miracle that how the whole thing will fit into the HD7770... But if it fits, it sits. )

 

Still there is room for MAD to be faster, but I think it's only can happen when the CU has all 10 waves inside.

Didn't tested for workgroup sizes bigger than 64. A test of that would be interesting 'tho.


Viewing all articles
Browse latest Browse all 17

Trending Articles