Search This Blog

Friday, March 19, 2010

Prefix Sum on CUDA

Is not as trivial as it seems. I just finished to implement one and I can tell what you need
read the scan Gpu Gems 3 Article in particular chapter **39.3.1 Stream Compaction**.

To implement your own start from the LargeArrayScan example in the SDK, that will give you just the prescan. Assuming you have the selection array in device memory (an array of 1 and 0 meaning 1- select 0 - discard), dev_selection_array a dev_elements_array elements to be selected a dev_prescan_array and a dev_result_array all of size N then you do


prescan(dev_prescan_array,dev_selection_array, N);
scatter(dev_result_array, dev_prescan_array,
dev_selection_array, dev_elements_array, N);

where the scatter is


__global__ void scatter_kernel( T*dev_result_array,
const T* dev_prescan_array,
const T* dev_selection_array,
const T* dev_elements_array, std::size_t size){

unsigned int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx >= size) return;
if (dev_selection_array[idx] == 1){
dev_result_array[dev_prescan_array[idx]] = dev_elements_array[idx];
}
}

for other nice application of the prescan see the paper Ble93

Otherwise you can use the CUDPP library where such primitives exists and are highly optimized even more that the sdk one.

Tuesday, March 16, 2010

New Blog Visual scheme

Reading the latimes today I found this new cool feature for customizing blogger appearance.
This is a fist update version, when I have more time I will play more with. Well done google!

Saturday, March 13, 2010

C++ Tricky

Nothing better than a bit of C++ for breakfast:
Reading A StackOverflow post I hacked a bit with an Object called SaveRestore.
The trick is to know that when the object gets out of scope { }, is destroyed.
You need to maintain a reference to the object and its value.

Thursday, March 11, 2010

Many Core and Reconfigurable SuperComputing Conference

I will speak at the Many Core and Reconfigurable SuperComputing Conference in Rome the 22-24 March 2010
with a talk entitled
A GPU Based Architecture for Distributing Dictionary Attacks to OpenPGP Secret Keyrings
To see all the other exciting talks: http://www.mrsc2010.eu/Programme