Thursday, March 19, 2009

.NET Integral Data Types And You

Most people are not like me. Not even the developers in the crowd. Most developers will never read the ECMA-335 Common Language Infrastructure Specification. And, even if they did, they probably wouldn’t spend weeks thinking about the implications of it.

The biggest thing that I’ve seen people trip up on, that isn’t obvious from higher level languages, is with the integral data types: SByte, Byte, Int16, UInt16, Int32, UInt32, Int64 and UInt64.

According to the CLI specification, there are only a couple data types that are allowed to be on the stack (see Partition III.1.1): an Int32, an Int64, a native-sized integer, a native-sized floating point type, an object reference, and a managed pointer. Things to really note in this list are the lack of short integers (Byte, Int16), and unsigned integers.  Any time one of these unrepresented types is loaded onto the stack, or stored to memory, a conversion will take place. This applies to fields, local variables, parameters, and return types. In most cases, these conversions are negligible, but in some cases, the conversions can become a major performance problem.

Here’s an example:

   1: void ForLoopInt16() {
   2:     for (Int16 i = 0; i < COUNT; i++)
   3:         DoSomething(i);
   4: }

And the CIL it compiles to:

   1: .method void ForLoopInt16()
   2: {
   3:   .locals init ([0] int16 i)
   4:     // C#: i = 0;
   5:     // CLI: i = (Int16)0;
   6:     // NOTE: I'm nearly sure both the MSCLR and Mono JIT engines
   7:     //       can optimize this into a straight constant load without
   8:     //       a conversion
   9:     ldc.i4     0
  10:     stloc      0          // Conversion!
  11:     br.s       LINE_24
  12:   LINE_12:
  13:     // C#: DoSomething(i);
  14:     // CLI: DoSomething((Int32)i);
  15:     ldloc      0          // Conversion!
  16:     call       void DoSomething(int32)
  17:     // C#: i++;
  18:     // CLI: i = (Int16)(((Int32)i) + 1);
  19:     ldloc      0          // Conversion!
  20:     ldc.i4     1
  21:     add
  22:     conv.i2               // Conversion!
  23:     stloc      0          // Conversion!
  24:   LINE_24:
  25:     // C#: i < COUNT;
  26:     // CLI: (Int32)i < COUNT;
  27:     ldloc      0          // Conversion!
  28:     ldc.i4     (COUNT)
  29:     blt.s      LINE_12
  30:   LINE_30:
  31:     ret
  32: } // end of method ForLoopInt16

In this example, there are a total of six conversions, five of which will happen COUNT times, and one that only happens once, and none are even necessary!  The local ‘i’ could just as easily be typed Int32, and the code would compile, run perfectly, and not once would there be a conversion.

Now, don’t get me wrong, there are cases where one of the short and/or unsigned data types are necessary.  Many of them involve interop with legacy code that uses them.  But, one should never, ever, under any circumstances, use them as an index in a for loop.

Integers convert
Quite unintuitivly
Please use them wisely

2 comments:

Piratfamiljen said...

Thanks, this is the one confirmation I found of that .NET is optimized for int32 rather than the smaller types.
Great post!

tomba said...

I dont' think your conclusions are quite correct.

You say CLI spec says only few datatypes are allowed on the stack. That's right, but it's only talking about the virtual stack and the intermediate language. The JIT can do anything it wants with the code, as long as the end result works as specified in the original CIL.

This means that the JIT'ed code _can_ store any datatypes in the native stack, or throw away all conversions that are unnecessary to achieve the desired result.

I haven't checked the assembly produced by MS's JIT, but if it does any decent code generation it should do for loops with shorts as good as with ints.

However, the CPU matters. There are CPUs that work slower with smaller datatypes (ARM for example is bad with bytes). However, if you do a for loop with bytes in C#, nothing prevents the JIT actually using ints when doing the loop as long as it can be sure it acts like it's byte (ie. taking overflow into consideration).

In the end the only way to know what is most optimal for you is to make performance profiling on the machines you want to support.