Parallelism: Subtly different floating point results?

I'm trying to debug my parallelism library for the D programming language. A bug report was recently filed that indicates that the low-order bits of some floating point operations that are performed using tasks are non-deterministic across runs. (If you read the report, note that parallel reduce works under the hood by creating tasks in a deterministic way.)

This doesn't appear to be a rounding mode issue, because I tried setting the rounding mode manually. I'm also pretty sure this is not a concurrency bug. The library is well-tested (including passing a Jinx stress test), the issue is always confined to the low-order bits, and it happens even on single-core machines, where low-level memory model issues are less of a problem. What are some other reasons why floating point results might differ depending on what thread the operations are scheduled on?

Edit: I'm doing some printf debugging here and it seems like the results for the individual tasks are sometimes different across runs.

Edit # 2: The following code reproduces this issue in a much simpler way. It sums the terms of an array in the main thread, then starts a new thread to execute the exact same function. The problem is definitely not a bug in my library, because this code doesn't even use my library.

import std.algorithm, core.thread, std.stdio, core.stdc.fenv;

real sumRange(const(real)[] range) {
    writeln("Rounding mode:  ", fegetround);  // 0 from both threads.
    return reduce!"a + b"(range);
}

void main() {
    immutable n = 1_000_000;
    immutable delta = 1.0 / n;

    auto terms = new real[1_000_000];
    foreach(i, ref term; terms) {
        immutable x = ( i - 0.5 ) * delta;
        term = delta / ( 1.0 + x * x ) * 1;
    }

    immutable res1 = sumRange(terms);
    writefln("%.19f", res1);

    real res2;
    auto t = new Thread( { res2 = sumRange(terms); } );
    t.start();
    t.join();
    writefln("%.19f", res2);
}

Output:

Rounding mode: 0

0.7853986633972191094

Rounding mode: 0

0.7853986633972437348

Another Edit

Here's the output when I print in hex instead:

Rounding mode: 0

0x1.921fc60b39f1331cp-1

Rounding mode: 0

0x1.921fc60b39ff1p-1

Also, this only seems to happen on Windows. When I run this code on a Linux VM, I get the same answer for both threads.

ANSWER: It turns out that the root cause is that floating point state is initialized differently on the main thread than on other threads on Windows in D. See the bug report I just filed.

9
задан dsimcha 16 April 2011 в 18:33
поделиться