Why is Office.js async?

For those who have used VBA before, you will know that VBA code was always executed in a linear (synchronous) fashion. This is very natural for an automation task, where you’re essentially manipulating a document through a series of steps (and where, more often than not, the steps are similar to the sequential series of steps that a human would do). For example, if you needed to analyze the current selection in Excel and highlight any values that were greater than 50, you might write something like this:

VBA macro for highlighting values over 50
 1 Dim selectionRange As Range
 2 Set selectionRange = Selection
 3 Call selectionRange.ClearFormats
 4 
 5 Dim row As Integer
 6 Dim column As Integer
 7 Dim cell As Range
 8 
 9 For row = 1 To selectionRange.Rows.Count
10     For column = 1 To selectionRange.Columns.Count
11         Set cell = selectionRange.Cells(row, column)
12         If cell.Value > 50 Then
13             cell.Interior.Color = RGB(255, 255, 0)
14         End If
15     Next column
16 Next row
Screenshot of the code in action

When run, such macro would execute line by line, reading cell values and manipulating them as it went. The macro have complete access to the in-memory representation of the workbook, and would run almost at the native Excel level, blocking out any other native Excel operations (or, for that matter, any user operations, since the VBA code executes on the UI thread).

With the rise of .NET, VSTO – Visual Studio Tools for Office – was introduced. VSTO still used the same underlying APIs that VBA accessed, and it still ran those APIs in a synchronous line-by-line fashion. However, in order to isolate the VSTO add-in from Excel – so that a faulty add-in would not crash Excel, and so that add-ins could be resilient against each other in a multi-add-in environment – VSTO had each code solution run within its own fully-isolated AppDomain. So while .NET code itself ran very fast – faster than VBA by pure numbers – the Object-Model calls into Excel suddenly incurred a significant cost of cross-domain marshaling (resulting in a ~3x slowdown compared to VBA, in my experience).

Thus, the VBA code above – translated into its VB.NET or C# equivalent – would continue to work, but it would run far less efficiently, since each subsequent read and write call would have to traverse the process boundary. To ameliorate that, the VSTO incarnation of the code could be made significantly faster if all of the read operations were lumped into a single read call at the very beginning. (The write operations, of setting the background of each cell one-by-one, would still have to remain as individually-dispatched calls).

Here is C# Version of the code that addresses the above scenario, but this time reads all of the cell values in bulk (see line #4 below).

VSTO incarnation of the code, with bulk-reading of the selection values (line #4)
 1 Range selectionRange = Globals.ThisAddIn.Application.Selection;
 2 selectionRange.ClearFormats();
 3 
 4 object[,] selectionValues = selectionRange.Value;
 5 
 6 int rowCount = selectionValues.GetLength(0);
 7 int columnCount = selectionValues.GetLength(1);
 8  
 9 for (int row = 1; row <= rowCount; row++)
10 {
11     for (int column = 1; column <= columnCount; column++)
12     {
13         if (Convert.ToInt32(selectionValues[row, column]) > 50)
14         {
15             Range cell = selectionRange.Cells[row, column];
16             cell.Interior.Color = System.Drawing.Color.Yellow;
17         }
18     }
19 }

Note that the VSTO code, when interacting with the documents, still runs on (a.k.a., “blocks”) the Excel UI thread, since – due to the way that Excel (and Word, and others) are architected – all operations that manipulate the document run on the UI thread. But fortunately for VSTO, the process-boundary cost – while an order of magnitude higher than that of VBA – is still relatively small in the grand scheme of things, and the batch-reading technique described above can alleviate a chunk of the performance issues… and so, VSTO could continue to use the same APIs in the same synchronous fashion as VBA, while letting developers use the new goodness of the .NET Framework.

With the introduction of Office Add-ins (known at the time as “Apps for Office”) in Office 2013, the interaction between the add-in and the host application had to be re-thought. Office Add-ins, based in HTML and JavaScript and CSS, needed to run inside of a browser container. On Desktop, the browser container is an embedded Internet Explorer process, and perhaps the synchronous model could still have worked there, even if it took another performance hit. But the real death knell to synchronous Office.js programming came from the need to support the Office Online applications, such as Word Online, Excel Online, etc. In those applications, the Office Add-ins runs inside of an HTML iframe, which in turn is hosted by the parent HTML page that represents the document editor. While the iframe and its parent are at least part of the same browser window, the problem is that the bulk of the documents might not – and generally is not – loaded in the browser’s memory. This means that for most operations, the request would have to travel from the iframe to the parent HTML page, all the way to an Office 365 web server running in a remote data center. The request would then get executed on the server, which would dutifully send the response back to the waiting HTML page, which would pass it on to the iframe, which would finally invoke the Add-in code. Not surprisingly, such round-trip cost is not cheap.

To put it into perspective: imagine that the entire roundtrip described above takes 50 milliseconds, and we are running a hypothetical synchronous and JavaScript-icized version of the VSTO macro. Imagine that we have one hundred cells, of which 50 meet the criteria for needing to be highlighted. This would mean that we need to make one request to clear the formatting from the selection, another to fetch all of the data, and then 50 requests for each time that we set on individual cell’s color. This means that the operation would take (2 + 50) * 50 milliseconds, or just over 2.5 seconds. Perhaps that doesn’t sound all that terrible… but then again, we were operating on a mere 100 cells! For 1000 cells, we’d be looking at 25 seconds, and for 10,000 cells we would be at over four minutes. What would the user be doing – other than sitting back in this chair and sipping coffee – while waiting for the Add-in operation to complete?!

If synchronous programming was out, the only remaining choice was asynchrony. In the Office 2013 model, this was embodied by a whole bunch of methods that ended with the word “Async”, such as:

    Office.context.document.setSelectedDataAsync(data, callback);

In this Office 2013 design, every operation was a standalone call that was dispatched to the Office host application. The browser would then wait to be notified that the operation completed (sometimes merely waiting for notification, other times waiting for data to be returned back), before calling the callback function.

Thus, while the Office 2013 APIs solved the Async problem, the solution was very much a request-based Async solution, akin to server web requests, but not the sort of automation scenarios that VBA users were accustomed to. Moreover, the API design itself was limiting, as there were almost no backing objects to represent the richness of the Office document. The omission was no accident: a rich object model implies objects that have countless properties and methods, but making each of them an async call would have been not only cumbersome to use, but also highly inefficient. The user would still be waiting their 2.5 seconds or 25 seconds, or 4+ minutes for the operation to complete, albeit without having their browser window frozen.

The new Office 2016 API model offers a radical departure from the Office 2013 design. The object model – now under the Excel namespace for Excel, Word for Word, OneNote for OneNote, etc., – is backed by strongly-typed object-oriented classes, with similar methods and properties to what you’d see in VBA. Interaction with the properties or methods is also simple and sequential, similar in spirit to what you’d do in VBA or VSTO code.

Whoa! How is this possible? The catch is that, underneath the covers, setting properties or methods adds them to a queue of pending changes, but doesn’t dispatch them until an explicit .sync() request. That is, the .sync() call is the only asynchrony in the whole system. When this sync() method is called, any queued-up changes are dispatched to the document, and any data that was requested to be loaded is received and injected into the objects that requested it. Take a look at this incarnation of the cell-highlighting scenario, this time written using the new Office.js paradigm, in JavaScript:

Office.js incarnation of the VBA macro, with blocks of still-seemingly-synchronous code
 1 Excel.run(function (context) {
 2     var selectionRange = context.workbook.getSelectedRange();
 3     selectionRange.format.fill.clear();
 4 
 5     selectionRange.load("values");
 6 
 7     return context.sync()
 8         .then(function () {
 9             var rowCount = selectionRange.values.length;
10             var columnCount = selectionRange.values[0].length;
11             for (var row = 0; row < rowCount; row++) {
12                 for (var column = 0; column < columnCount; column ++) {
13                     if (selectionRange.values[row][column] > 50) {
14                         selectionRange.getCell(row, column)
15                             .format.fill.color = "yellow";
16                     }
17                 }
18             }
19         })
20         .then(context.sync);
21 
22 }).catch(OfficeHelpers.Utilities.log);

As you can see, the code is pretty straightforward to read. Sure, there are the unfamiliar concepts of load and sync, but if you squint over the load and sync statements and the Excel.run wrapper (i.e., only look at lines #2-3, and then #9-18), you still have seemingly-synchronous code with a familiar-looking object model.

If instead of plain JavaScript you use TypeScript (see “A word on JavaScript and TypeScript), the Office.js code becomes even cleaner.

The same Office.js rendition, but this time making use of TypeScript 2.1’s async/await feature
 1 Excel.run(async function (context) {
 2     var selectionRange = context.workbook.getSelectedRange();
 3     selectionRange.format.fill.clear();
 4 
 5     selectionRange.load("values");
 6     await context.sync();
 7 
 8     var rowCount = selectionRange.values.length;
 9     var columnCount = selectionRange.values[0].length;
10     for (var row = 0; row < rowCount; row++) {
11         for (var column = 0; column < columnCount; column ++) {
12             if (selectionRange.values[row][column] > 50) {
13                 selectionRange.getCell(row, column)
14                     .format.fill.color = "yellow";
15             }
16         }
17     }
18 
19     await context.sync();
20 
21 }).catch(OfficeHelpers.Utilities.log);

In fact, if you ignore the Excel.run wrapper code (the first and last lines), and if you squint over the load and sync statements (lines #5-6 and #18), the code looks reasonably similar to what you’d expect to write in VBA or VSTO!

Still, programming using the new Office.js model – and, for VBA or VSTO users, adapting to some of the differences – has a definite learning curve. This site will guide you through getting started, and understanding the API basics.  The book will then pick up from there, diving deeper into Office Add-in concepts and API patterns.