Thursday, July 6, 2017

Automated marshalling of managed- to unmanaged structures

An art that I wouldn't even wish upon my greatest enemies to figure out.

I recently started a new project in a language very familiar to me: C#, a managed language. This means that all the memory management is done for you, which is both a blessing and a curse. For this project, I happened to struggle over the "curse part" of automatic memory management.

The problem I encountered is as follows: when you have two structures that are identical in terms of their members, their respective sizes can (and often will) differ in managed languages compared to unmanaged languages. For example, I have the following structure in C# and C++ respectively:

// C#
[StructLayout(LayoutKind.Sequential)]
struct DebugComponent
{
    public float4 Float4;
    public float Float;
}

// C++
struct CPP_DebugComponent
{
    float4 Float4;
    float Float;
};

The size of the structure can be found in C# by using Marshal.SizeOf() (or sizeof() in unsafe code) and reports that the structure is 20 bytes in size, which is correct. Note that I already applied the StructLayout to Sequential, as this will create a layout similar to unmanaged code.

The size of the same structure in C++ using sizeof() reports that the structure is 32 bytes. This is also correct, because the float4 type here is aligned to 16 bytes, meaning the structure will receive another 12 bytes of padding at the end, to make sure it aligns with 16 bytes.

Unfortunately trying to use this structure in a tool such as ManagedCuda, the CUDA kernel struct will use the C++ version, and when you call the kernel from your C# code you will have to use the other version. This creates a mismatch in memory layout, resulting in very weird artifacts after running the kernel, or even crashing because you're writing to unallocated memory in this case.

The "simple" solution I found is to manually expand the C# structure by using the StructLayout.Size attribute to extend the structure to 32 bytes instead of 20. After asking my question on StackOverflow, I didn't solve the problem to create these structures automatically without counting the sizes of every individual type in the structure itself.

So I had to switch up my solution a little bit. I created a project which contains the raw C# structures that I want to use, along with all their functionality like loading and serialization. I then created another C# project for the automated code generation. Using this project we can load our Structures as a dll, from which we can derive all the structures and what types they contain in text templates:
  • Structures: project that contains raw structures that will be used on the GPU
  • Tools: my general purpose project that will generate GPU versions of the structs defined in Structures.
In order to make this conversion as secure as possible I don't want to manually check every time I create a new structure if the GPU version has the same alignment and size. So I created two more projects:
  • AlignedStructsWrapper: A C++/CLI project that combines managed and unmanaged code
  • Tests: a unit test project
Using the CLI project, we can load both our versions of the structure: the managed C# version and the unmanaged C++ version. We can now measure the difference in their sizes:

public ref struct WrapperGpuDebugComponent
{
public:
 int SizeDiff()
 {
  int managedSize = sizeof(Tools::Content::Generated::DebugComponent);
  int nativeSize = sizeof(CUDA::CPP_DebugComponent);
  return managedSize - nativeSize;
 }
};

In the unit test project we load our CLI from reference and we can create a simple unit test that calls the SizeDiff function and checks if the difference is indeed 0:

[TestMethod]
public void CheckStructureSizes()
{
 WrapperGpuDebugComponent debugcomponent = new WrapperGpuDebugComponent();
 Assert.AreEqual(debugcomponent.SizeDiff(), 0);
}

Of course I also generated the CLI structures and the unit test functions automatically for every structure so I only have to recompile the projects and have everything tested.