From Synchronous to Asynchronous in JavaScript

Foreword

Asynchronous approach in JavaScript in general and especially in its particular implementation - Node.js is frequently advertised as an advantageous platform feature. Decomposing an algorithm into small computational units and executing them sequentially in the single process eliminates multi-threading concurrency issues and simplifies basic multitasking, which is important in many applications. But this also leads to shaping code in the specific manner and spending some resources on passing control from one function to another.

This article unveil the process of converting synchronous code to asynchronous by example - converting synchronous code that reads lines from the text files to asynchronous. The starting point is a completely synchronous implementation, from with the asynchronous one grows through the step-by-step improvements.

Synchronous Code

Reading the lines from the data stream is not a trivial task. The main complexity is in correct handling the concatenation of sequential data chunks as the border between them can brake the Unicode character or line separator.

Let’s assume that the line reading code is implemented. Then the synchronous code will be the following:

function processLineSync(line) {
  // ...
}

var line, reader = createTextReader(file);
while (line = reader.readLineSync())
  processLineSync(line);
reader.close();

The mock implementation of createTextReader supports both synchronous and asynchronous reading:

function createTextReader() {
  var counter = 0, limit = 10000000;
  return {
    readLineSync : () => {
      counter += 1;
      return counter < limit ? counter.toString() : null;
    },
    close : () => { }
  };
}

This synchronous approach works well, but it blocks the Node.js process.

Going Asynchronous

To build an asynchronous version of this code we need to understand how it is different from synchronous. The approach I suggest is to add asynchronous operations in several iterations and explain the changes.

Let’s assume that processLineSync has the asynchronous version - processLine(line, err => { }). Calling this function in the loop will case overflow of some kind. For example, the following code will quickly end up with “out of memory” in a few moments:

function processLine(line, done) {
  setImmediate(() => { console.log('ok'); done(); });
}

var line, reader = createTextReader(file);
while (line = reader.readLineSync())
  processLine(line, () => { });
reader.close();

The asynchronous callback is scheduled by adding its handler to the queue of functions that should be executed in the node.js process. When the current function ends, the node.js task pool manager gets the next task from the queue and executes it. The problem with the code above is that it only adds functions to the queue but does not end.

To correctly pass the control to the scheduled function the body of the loop should be a function too. And as the body should be executed for every line in the file it looks like it should call itself. But that just the definition of the recursive function. Recursion is quite common concept so there should be easy to try this approach:

unction asyncWhile(criteria, iteration, done) {
  if (criteria()) {
    iteration(() => asyncWhile(criteria, iteration, done));
  } else {
    done();
  }
}

var line, reader = createTextReader(file);
asyncWhile(
  () => {
    line = reader.readLineSync();
    return line != null;
  },
  done => processLine(line, () => done()),
  () => reader.close());

This is a bit close to the goal but now it looks clumsy as it relies on the global variables. And of course the errors should be handler and stack overflow exception should be addressed (a little bit later about it). Another nice improvement will be to read the line asynchronously.

The variables can be removed by adding execution context to the callback functions and transfer the criteria value to the iteration callback through the parameter:

function asyncWhile(context, criteria, iteration, done) {
  var value = criteria.call(context);
  if (value) {
    iteration.call(context, value, () => asyncWhile(context, criteria, iteration, done));
  } else {
    done.call(context);
  }
}

asyncWhile(createTextReader(file),
  () => this.readLineSync(),
  (line, done) => processLine(line, () => done()),
  () => this.close());

If you read this with enough attention, you may notice that the processLine() signature is identical to the third parameter of asyncWhile. Do not wrapping the calls is possible, but makes the code less appropriate for the purpose of the article:

var reader = createTextReader(file);
asyncWhile(reader, reader.readLineSync, processLine, reader.close);

Returning the error object in the first parameter of the callback function is the recommended way to handler errors of asynchronous functions in Node.js. Considering this, let’s read the first parameter of the iteration function callback and terminate the loop if it is truthy:

function asyncWhile(context, criteria, iteration, done) {
  var value = criteria.call(context);
  if (value) {
    iteration(value, err => {
      if (err) {
        done.call(context, err);
      } else {
        asyncWhile(context, criteria, iteration, done);
      }
    });
  } else {
    done.call(context);
  }
}

Let’s try asyncWhile in some real task - count the number of lines in the file:

asyncWhile({ reader : createTextReader(file), counter : 0 },
  () => this.reader.readLineSync(),
  (line, done) => {
    this.counter += 1;
    done();
  },
  () => {
    console.log(this.counter);
    this.reader.close();
  });

Almost instantly the program fails with the exception “Maximum call stack size exceeded” and the stack contains repeating pattern asyncWhile() -> Object. -> asyncWhile() -> …. Well, this issue is quite typical as it frequently arise when the loop is replaced with the recursive function. In functional languages there is an optimization for that, but not in JavaScript. So the optimization should be hardcoded by replacing direct recursive call with delayed one, which happens on top of the stack:

function asyncWhile(context, criteria, iteration, done) {
  var value = criteria.call(context);
  if (value) {
    iteration.call(context, value, err => {
      if (err) {
        done.call(context, err);
      } else {
        setImmediate(() => asyncWhile(context, criteria, iteration, done));
      }
    });
  } else {
    done.call(context, null);
  }
}

Finaly, let’s add some async bits on top: make the call to criteria asynchronous and make createTextReader asynchronous as well. The final picture is following:

function asyncWhile(context, criteria, iteration, done) {
  criteria.call(context, (err, value) => {
    if (err) {
      done.call(context, err);
    } else {
      if (value) {
        iteration.call(context, value, err => {
          if (err) {
            done.call(context, err);
          } else {
            setImmediate(() => asyncWhile(context, criteria, iteration, done));
          }
        });
      } else {
        done.call(context, null);
      }
    }
  });
}

As the criteria callback is now asynchronous, the synchronous version of the line reading routine can be replaced with asynchronous:

function createTextReader(file, done) {
  var counter = 0;
  done(null, {
    readLine : done => {
      counter += 1;
      done(null, counter < 10000000 ? counter.toString() : null);
    },
    close : () => { }
  });
}

Remember that the current fake implementation of createTextReader is only for testing. If you would like to look on its production version you can check my Node.js module github.com/AlexAtNet/async-read-lines.

And the final use case:

createTextReader(file, (err, reader) => {
  asyncWhile(reader,
    done => this.readLine((err, line) => done(err, line)),
    (line, done) => processLine(line, (err) => {
      if (err) console.log(err);
      done();
    }),
    () => this.close());
});

Conclusion

The asynchronous code is much more complex than synchronous. It is also much more slower (about x40). From the other hand, it does not block the current process and allows to run several tasks in parallel (in Node.js terms). The relative performance impact depends on how readLine is implemented and what should be done on every iteration.

If you like to comment or suggest or ask a question feel free to contact me directly by email.