Koichi Tamura's blog: July 2014

Sunday, July 27, 2014

Amazing

from Wikipedia:

Andromeda Software Development (or simply ASD) is a Greek demogroup that was formed in 1992. They produced a number of small intros and demos in the mid-1990s for the PC, most notably CounterFactual (winner of the first Greek demo party ever, The Gardening 1995) and Beyond (Placed 4th in The Gardening 1996). ASD was quiet for the following years until 2001, when they presented Cadence & Cascade - their first accelerated demo - and won the Digital Nexus demoparty, held in Athens, Greece.

Wednesday, July 16, 2014

Testing unified memory

I just tested CUDA6's unified memory. This code worked as I expected.

#include <iostream>
#include <string.h>

#include <cuda_runtime.h>

__global__ void set(char* buf, unsigned int num)
{
 int i = blockDim.x * blockIdx.x + threadIdx.x;
 if (i < num)
 {
  buf[i] = 1;
 }
}

int main(void)
{
 unsigned int num = 16384;
 char *buf;
 cudaMallocManaged(&buf, num);
 memset(buf, 0, num);                         //##1## D->H(16384bytes). page fault.
 set<<<4, 128>>>(buf, num);                   //##2## H->D(16384bytes).
 cudaDeviceSynchronize();
 std::cout << (int)buf[0] << std::endl;       //##3## D->H(16384bytes). page fault.
 std::cout << (int)buf[10000] << std::endl;   //No data transfer.
 set<<<4, 128>>>(buf, num);
 cudaDeviceSynchronize();
 buf[0] = 5;                                  //##4## D->H(16384bytes). page fault.
 set<<<4, 128>>>(buf, num);                   //##5## H->D(4096bytes).
 cudaFree(buf);
 cudaDeviceReset();
   return 0;
}

I used a profiler and added comments where data transfers have occurred. Whenever an un-transferred host memory is accessed for either read or write, Data transfer D->H occurs, not only before read but before write. In my understanding it is because unless it copies the memory D->H on write, once you wrote a value to a memory and then read a memory near the address where you have modified the value, data transfer D->H on read would overwrite your modification. It produces a bit of unnecessary data transfer (##1##).
You can also see when ##5##, only 4096 bytes of data transfer occurs, not full 16384 bytes. It is because it keep tracks of the page you have written (I have asked it to a NVIDIA guy).

I modified the main function a little bit and created another buffer buf2.

int main(void)
{
 unsigned int num = 16384;

 //Below code adds D->H(16384bytes), H->D(16384bytes). 1 page fault.
 char *buf2;
 cudaMallocManaged(&buf2, num);
 memset(buf2, 0, num);

 //Same as above.
 char *buf;
 cudaMallocManaged(&buf, num);
 memset(buf, 0, num);                         //D->H(16384bytes). page fault.
 set<<<4, 128>>>(buf, num);                   //H->D(16384bytes).
 cudaDeviceSynchronize();
 std::cout << (int)buf[0] << std::endl;       //D->H(16384bytes). page fault.
 std::cout << (int)buf[10000] << std::endl;   //No data migration.
 set<<<4, 128>>>(buf, num);
 cudaDeviceSynchronize();
 buf[0] = 5;                                  //D->H(16384bytes). page fault.
 set<<<4, 128>>>(buf, num);                   //H->D(4096bytes).
 cudaFree(buf);
 cudaFree(buf2);
 cudaDeviceReset();
   return 0;
}

In this code I allocated a unified memory to buf2 and called memset() but buf2 is not used elsewhere. It adds D->H(16384bytes) when memset() is called and it is natural, but it also adds H->D(16384bytes). We know the kernel doesn't use buf2 but it is too complicated for CUDA to know that. So before the first kernel call it also transfers buf2, which is another unnecessary data transfer. I have concluded unified memory makes the program simpler but not as efficient as I have expected. I heard Pascal has cleverer memory management. I will wait and see if is good enough.

Several posts

SPH second test
SPH based particle fluid sim test

Python C/API tutorial
Dive into Python C/API

Making a patch for boost.python
My little contribution to the boost project

Shake Command Window
command window on Shake, like Maya script editor

ShakeMayaRender
Shake/Maya Integration

Demo
ShakeMayaRender demo

Python CommandPort
A Maya plug-in to make commandPort like MEL command for Python

texPPattrMapperNode
Maya plug-in to map mesh color/normal to PP attribute

Maya python from C++
How to access to Maya python objects from a plug-in written in C++

Koichi Tamura's blog

Sunday, July 27, 2014

Amazing

Wednesday, July 16, 2014

Testing unified memory

Free

Koichi Tamura

Followers

Several posts

Labels

Blog Archive