Max Polun

How to be concurrent

There are lots of ways that different languages do concurrency, and I want to talk about the general ways they do it, without getting bogged down in language details.

So what is concurrency? It’s not parallelism, that’s for sure. It’s at it’s simplest the ability to do work in the background while not pausing work in the foreground. Some forms of concurrency can use parallel hardware resources (CPU cores, etc), but not all.

I’m going to clasify low-level concurrency features (as opposed to high-level patterns that can use multiple features at once) along the following axes:

  1. Shared/Seperate memory

If two concurrent tasks share an memory, then sending data between them is trivial but it’s possible to corrupt data that isn’t protected somehow. The protection can either be locking of some sort, transactions, or just explicit switching between tasks.

  1. Allows parallelism / no parallelism

Concurrency is not parallelism, but if you have parallel hardware (multiple cores, etc) it can often make sense to do parallel computation with the same abstraction you use for concurrency. However the downsides are the need for additional synchronization which can wash out any advantages you get from parallelism.

  1. Implicit / explicit task switching

If your tasks switch implicitly, you have to protect any data that can be shared between different tasks. Explicit task switching removes that need, but has boilerplate and can cause global slowdowns if a single task does not yield.

Forms of concurrentcy


  • Seperate memory
  • Allows parallelism
  • Implicit task switching

Processes are extremely safe to use. You can’t share data, and you can’t freeze the system through negligence (though deadlock is always an option). However, these process can be quite heavyweight in imperative programming (they can be lighter weight in a functional system because zero copying is necessary in order to send messages between processes)


  • OS processes (heavyweight but general. Literally any language can use seperate OS processes)
  • Erlang processes (lightweight, but tied tightly to a particular system and language)


  • Shared memory
  • Can allow parallelism (depends on language/implementation)
  • implicit task switching

Moving from the safest interface to the least safe, threads can extremely easily to corrupt your memory. For this reason some languages reduce the risk with a global lock (python’s GIL or ruby’s GVL). I think threads work very badly with dynamically typed languages because all writes are read/writes. That makes correct locking extremly difficult. You still need to lock any shared data.

However threads are extremely flexible. It’s what most other types of concurrency (including processes, inside the OS) are implemented with.


  • OS threads (supported by most languages)
  • Goroutines

Async functions

  • Shared memory
  • No parallelism
  • Explicit task switching

This is what javascript uses. You schedule some task (usually some form of IO) and wait for it to complete or fail. No async tasks are completed until you either ask for them (in lower level languages), or all of your code has returned (in higher level languages, especially javascript).


  • poll/select/epoll/kqueue
  • javascript
  • event machine/twisted/tornado/etc

Why do most forms of concurrency fit one of these groupings? Let’s look at the others:

  • Seperate memory
  • No parallelism
  • explicit task switching

This just seems to not have any benefits: you can’t share data, you can’t do anything in parallel, and you have to explicitly switch tasks all the time. If you’ve got seperate memory there’s no reason to not allow implicit task switching and parallelism.

  • Seperate memory
  • No parallelism
  • implicit task switching

This is a bit better. Erlang used to be like this (only one thread was multiplexed between processes), but it’s really just a matter of technology to allow parallelism. Again, if you have seperate memory you might as well allow parallelism. That said, this is a perfectly reasonable initial implementation.

  • Shared memory
  • No parallelism
  • implicit task switching

Running go with GOMAXPROCS=1 is basically this. Same with greenlets. You still need to protect your data from access by multiple threads, but in practice less is required, you can get away with being sloppy. It’s kind of like the old erlang processes: you don’t lose anything by being parallel so you might as well do it down the line, though it’s more of a tradeoff here than a pure win.


These general categories of concurrency features have different tradeoffs, but those can be changed somewhat by implementation choices. The fundamentals don’t really change, but what’s cheap or expensive can change:


  • Lightweight processes

If you multiplex many processes onto a small, fixed numer of OS threads/processes, you can make processes mor elightweight. The tradeoff with lightweight versus full processes is that lightweight process generally cannot call C code easily and directly, but they use less memory.


  • Lightweight threads

Lightweight threads are multiplexed onto a small number (usually equal to the number of CPUs) of hardware threads. They have similar tradeoffs as lightweight processes – they make interaction with the OS and hardware more difficult, but use less memory so more can be started.

  • Static verification

This is rust’s big trick. Rust’s rules of ownership disallow data races at compile time. In order to share data between threads you need a mutex or other protection, and this is impossible to mess up in safe rust. This makes more ambitious use of threads feasible. However it increases the complexity of the language and can only catch a subset of concurrency problems (in rust’s case, only data races).


  • Promises/Futures

Promises (or Futures) are the representation of some value that will be available eventually. They provide a good abstraction for building async combinators on top of, which raw callbacks do not. Callbacks are more general, but promises are a good basis for dealing with common concurrent patterns.

  • Async/await

First coming from C#, but now spreading to many languages, this makes async programming look serial, but keeps all task switching explicit. It can also be faked if you have a coroutine abstraction. The tradeoff here is language complexity vs development efficiency.

In-depth examples


Erlang is intended to be used in highly reliable systems. It does this by having many processes that are isolated from each other and a tree of processes monitoring each other, so that lower level process are restarted by higher level processes. This leads to a lightweight process model: you don’t want processes to have hidden dependencies on each other, because then you can’t kill and restart them if something goes wrong, and you want to be able to start a truly huge number of processes. Erlang is deeply affected by this concurrency model – no types that cannot be efficiently serialized and sent between processes that are possibly on different machines exist in erlang. This makes erlang extremely well-suited for what it was designed for: highly reliable networking infrastructure, but less well suited for many other types of programming.


Go was designed as a reaction to C++, and draws some inspiration from erlang, specifically it has goroutines which are lightweight threads. Unlike erlang however, goroutines are not prohibited from sharing memory (socially it’s recommended to communicate by message passing, but sharing memory is allowed, and easy to do by mistake). This takes away many of both the benefits and drawbacks of erlang’s model. This has the side-effect of making Go more of a Java competitor, rather than a C++ competitor: interacting with the system (as in, calling C) has lots of overhead and complexity. That said, having threads be cheap makes many nice patterns feasible that would be prohibitively slow in other languages. Go also does provide good tools for communicating using message passing, and strongly recommends it’s use. This has the effect of having concurrency be much like the rest of the language: simple, pragmatic, but full of boilerplate and pitfalls.


Rust is also a reaction to C++, but has much stronger compile-time abstractions (as opposed to Go having almost all run-time abstractions). For concurrency, rust experimented with many different forms: for a long time it supported go style lightweight threads, however now it only supports native threads built in (though like all languages you can spawn additional OS processes, or use async functions). The advantage of rust over C++ in concurrency is that rust enforces proper memory accesses at compile time. This adds some complexity to the language (though rust gets great bang for the buck: the same compile time check to ensure proper memory use with threads, also ensures proper memory use within a thread), and can be hard to learn, but matches the way that systems programmers generally already write code. This makes rust a true systems language: low runtime overhead, interacting with the system is basically free, and but more difficult to program in than higher-level languages.


Node’s answer to concurrency issues is to just always be single-threaded, and use async functions for all concurrency. In fact, it doesnt have blocking functions for many IO operations (and even ones it does have are rarely used). This infamously leads to giant chains of callbacks, though these days promises and async/await can help with this dramatically. It does split all javascript functions into sync and async functions, something that has to be kept in mind always while writing node code. The plus side is that it doesn’t make any promises it can’t fufill, unlike other dynamic languages (like python and ruby which offer threads but have locks on running all python/ruby code). Since there’s almost no blocking IO, it also means that each node process can handle quite a bit of IO, making it great for networking applications or web servers. However node doesn’t have a great story for handling computation heavy code yet. You can spawn a different OS process, but it’s still not an easy operation. At some point node may introduce a lightweight process, but node is probably never going to offer shared memory concurrency.


nginx is a great example of how to combine different concurrency models. It spawns a thread for each CPU, and then within each thread uses async functions to do ant actual IO. This makes for a highly efficient system: it can handle lots of connections, but unlike somethign like node, if there’s some heavy computation that needs to happen at some point other threads will pick up the slack while one thread is blocked. Node can work around the issue sometimes with multiple processes, but multiple threads


This is more of an overview than anything, but I hope that it helped you understand what different types of concurrency are available, and what the different tradeoffs are. You could write a whole book about this topic.

My own opinion has shifted over time to think that lightweight threads and processes are over-hyped. They aren’t bad ideas, but it’s not a pure win like so many portry it as.

Posted in: at 23 Feb 2016

Isomorphic Javascript is just Progressive Enhancement done right

I was just at Fluent this week, and I had an interesting thought, spurred by several things, but really crystalized when I saw this talk by Eric Meyer.

So, the (perhaps badly named) concept of Isomorphic Javascript is usually sold as a performance optimization for loading time in single-page applications, which is one benefit it provides. However it actually fixes the problem with single-page apps – they break the web. A single-page app that does not render on the server (isn’t isomorphic) doesn’t just degrade when javascript doesn’t work, it’s totally broken. Like, blank page. This is a problem on any page, but practically, it’s biggest on the open web (not behind a login). Closed sites (and especially enterprise sites/apps) can usually get away with doing various odd things, even though they probably shouldn’t. Things like web spiders, users on crappy mobile connections, users behind odd firewalls, these usually matter more on the open web. (Accessibility is also easier when rendering on the server, but can be made to work with javascript-only sites)

The thing is, older jquery-based progressivly enhanced sites had a number of problems:

  1. You either had to have double the rendering code, or have your page look and work dramatically different without javascript
  2. You might have a page that was technically usable, but in practice terrible without JS – datepickers are the most common thing I can think of. In a typical jquery-type date-picker progressive enhancement situation, there’s a text input with a particular format you need to use, which is much mor epainful to use than a datepicker.
  3. As you move more logic into the client, maintenence and code orginization becomes a problem that traditional tools like jquery plugins just can’t solve.

The first-attempt solution to these issues was with the various first-gen javascript app frameworks. Angular 1, Backbone, Ember 1, etc. These frameworks were developed with the closed web in mind – enterprise apps, or at least ones that needed a login. I’m not sure the creators of those frameworks envisioned things like blogs using these frameworks, and indeed, it has caused problems when they do. They were tightly coupled to the actual DOM, so though they could be made to prerender the page with enough work, it wasn’t easy. Various frameworks attempted to make rendering on the client and server equally easy, but it wasn’t really until React came out that the idea went mainstream. Now all of the next-gen frameworks (including Angular 2 and Ember 2) will be much easier to render isomorphically.

Which brings me to my point: Isomorphic javascript is just progressive enhancement done right. You always serve up a usable page, but you can do it without sacrificing all the benefits of single-page apps. Of all the ways to do progressive enhancement, it’s the most:

  1. Accurate – the markup will be the same because it’s rendered by the same code
  2. Maintainable – one codebase, one rendering path
  3. Quickly rendering – we can use all the tricks of traditional html rendering sites to get the page to render fast
  4. Quickly updating – user input is captured instantly and processed by javascript when available.

Isomorphic javascript can actually do things that are usually infeasible to do in typical progressive enhancement as well – it can render the page with your open modal or datepicker in it on the server, and have it work exactly like when javascript is working. None of this comes for free – testing and hard work is still needed – but things become feasible that weren’t before.

Posted in: at 24 Apr 2015

EPR: A utility for simplifying node paths

So let’s say you’ve got a node project, with a structure somewhat like this:

- project/
  - package.json
  - server.js
  - lib/
    - file1.js
    - file2.js
    - models/
      - model1.js
      - model2.js
  - spec/
    - file1Spec.js
    - file2Spec.js
    - models/
      - model1Spec.js
      - model2Spec.js

Your require statements in your specs can easily get very ugly:

var model1 = require('../../lib/models/model1')

They’re also fragile – if you move either your spec file or your implementation file, you’ve got to update your requires. This is a good argument for using lots of small modules that can be broken out – if a module lives in your node_modules folder then requireing it is always easy:

var file1 = require('file1')

The problem is that when you’re writing an app lots of the code can’t really be seperated out to tiny modules – it’s app-specific. There have been a few suggestions on how to address this problem, but epr is my attempt at solving it in a nice, repeatable way.

EPR works by making symlinks in your node_modules folder. It gets the list of symlinks to create from your package.json file. So for the above example, you could add the following to your package.json file:

  "epr": {
    "file1": "lib/file1.js",
    "file2": "lib/file2.js",
    "models": "lib/models"

You could have requires like the following:


no relative paths present, and you never need to update any requires – you just need to update your package.json if you move one of your files.

So check out epr, if you’re using node and are annoyed by relative paths.

Posted in: at 31 Jan 2015

RISC vs CISC doesn't matter for x86 vs ARM

If you’ve been following tech lately, you’ve probably heard people talking about the competition between x86 chips (mainly from Intel), and arm chips. Right they’re used mostly for different things – phones and tablets have arm chips, desktops, laptops, and servers have x86 chips – but Intel’s trying to get into phones and arm vendors want to get into servers. This promises lead to some exiciting competition, and we’re already reaping the power benefits of Intel working on this in desktops and laptops. However whenever this comes up, people bring up that arm is RISC and x86 is CISC, presenting RISC like it’s a pure advantage and that x86 must be crippled because it’s CISC. This doesn’t matter and hasn’t for a long time now, but let me explin why.

RISC means Reduced Instruction Set Computing, and it really comes out of the 80s, and it describes a certain style of instruction set (ISA) for a computer. The instruction set are all the low-level commands the CPU supports, so it might have things like “load this value from memory”, or “add these two numbers together”. The ISA doesn’t say how those commands have to be implemented though. Despite the name, the one thing that really seperates RISC from other types of instruction set is not the number of different instructions, but that most instructions to only one thing – they don’t have different addressing modes. On traditional architectures you’d have instructions that do the same thing, but can work on different types of operands. For example you might be able to add 2 registers together, or add memory and a register, or add memory to memory. This could become extremely complex, and arguably reached the height of it’s complexity in the VAX ISA. The VAX was very nice to write assembly code in, but the vast majority of those addressing modes weren’t needed when you use a language like C, and the compiler is responsible for making sure you load data when you need to.

The big argument that RISC proponents made was that you could cut out many of these addressing modes, and focus on making your basic operations fast, resulting in a faster overall chip. Since most modes in something like the VAX were rarely used they were usually microcoded and slow, so you had to know which modes were fast anywaysdefeating a lot of the point of having so many complex modes. RISC proponents dubbed traditional ISAs as CISC (Complex Instruction Set Computing), it’s not a term that anyone would use for their own work. RISC was very successful in the 80s – ARM started then, DEC (the makers of the VAX) made the alpha, Sun made the SPARC, and even IBM got into the action with POWER. However this was mostly in “big” chips (ARM being the big exception), the other story of the 80s was the growth of the micros – tiny chips that were cheap enough for individuals to buy started coming out in the 70s, and by the 80s there were lots of computer using them: think of IBM PCs (using x86), Commodore 64s (using the 6510, a varient of the 6502 which was used in the Apple II and NES as well), the original Apple Macintosh, and the Amiga (the mac and amiga both used the motorola 68k family). All of these were using what we’d consider CISC chips – they had various addressing modes. Nothing crazy like the VAX, but it was always the outlyer in ISA complexity. All of these ISAs still exist, but most are only used in tiny embedded chips (other than x86). Of those computer ecosystems, the PC took over the world, and the mac survived, but still is a small portion of the computer market (and uses x86 anyways these days after using a RISC chip for a while).

So with that story set, why doesn’t RISC and CISC matter anymore? Well there are 2 big reasons 2 things: out of order execution (ooo), and the fact that an ISA doesn’t specify how a chip is implemented. Out of order execution was the end result of a lot of things people were trying to do with RISC chips in the 80s – each instuction basically executes asynchronously and the CPU only waits for the results of an instruction if it’s being used by somethign else. This makes the ISA matter a lot less because it doesn’t really matter if you load data and use it in one instruction or 2. As a matter of fact, since the late 90s Intel has been internally splitting it’s CISC instructions into RISC-like micro-ops, which points out how the whole RISC vs CISC thing is pointless these days.

That doesn’t mean that ISA doesn’t matter, but the devil is really in the details now. x86 is honestly a bit of a mess these days, and decoding it is more complex than decoding ARM instructions (or really any other extant ISA). ARM also just updated it’s ISA for 64 bits, and from what I’ve heard it sounds like they did a really good job, basically making the totally generic RISC ISA with no weird stuff that makes it hard to use. X86 was never even close to the complexity of somethign like the VAX, so avoided a lot of it’s problems. RISC chips are also not without strange things that hurt them down the line – they often exposed internal details of their early implementations, which they had to emulate in later faster version. So if you want to compare the x86 and arm ISAs, that’s actually an important and interesting comparison to make, but the acronyms RISC and CISC don’t actually add anything.

Posted in: at 28 Jan 2015

osc - play with the web's audio generation capabilities

I’ve been playing around with the WebAudio api for a bit and come up with a nice little demo program that shows the basic capabilities of the OscillatorNode interface (plus some fun canvas programming). It’s not a serious project, but it is fun. I’m calling it osc, and you can also check out the source code.

Posted in: at 27 Nov 2013