Parse double precision IEEE floating-point on a C compiler with no double precision type

I am working with an 8-bit AVR chip. There is no data type for a 64-bit double (double just maps to the 32-bit float). However, I will be receiving 64-bit doubles over Serial and need to output 64-bit doubles over Serial.

How can I convert the 64-bit double to a 32-bit float and back again without casting? The format for both the 32-bit and 64-bit will follow IEEE 754. Of course, I assume a loss of precision when converting to the 32-bit float.

For converting from 64-bit to 32-bit float, I am trying this out:

// Script originally from http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1281990303
float convert(uint8_t *in) {
  union {
    float real;
    uint8_t base[4];
  } u;
  uint16_t expd = ((in[7] & 127) << 4) + ((in[6] & 240) >> 4);
  uint16_t expf = expd ? (expd - 1024) + 128 : 0;
  u.base[3] = (in[7] & 128) + (expf >> 1);
  u.base[2] = ((expf & 1) << 7) + ((in[6] & 15) << 3) + ((in[5] & 0xe0) >> 5);
  u.base[1] = ((in[5] & 0x1f) << 3) + ((in[4] & 0xe0) >> 5);
  u.base[0] = ((in[4] & 0x1f) << 3) + ((in[3] & 0xe0) >> 5);
  return u.real;
}

For numbers like 1.0 and 2.0, the above works, but when I tested with passing in a 1.1 as a 64-bit double, the output was off by a bit (literally, not a pun!), though this could be an issue with my testing. See:

// Comparison of bits for a float in Java and the bits for a float in C after
// converted from a 64-bit double. Last bit is different.
// Java code can be found at https://gist.github.com/912636
JAVA FLOAT:        00111111 10001100 11001100 11001101
C CONVERTED FLOAT: 00111111 10001100 11001100 11001100
23
задан Ben Voigt 10 April 2011 в 22:04
поделиться