Monday, April 11, 2022

The Correct Way to Lock a Get-Or-Create Map

A Get-Or-Create map is useful for the "Multiton" (mutiple-singleton) design pattern. Locking is used to make it thread-safe. It may not seem obvious, but the straightforward way to lock a map with Get-Or-Create semantics is problematic.

// Store keeps track of all the objects created so they can be shared.
type Store struct {
  mu sync.Mutex
  m map[string]*Object
}

func (s *Store) GetOrCreate(key string) (*Object, error) {
  s.mu.Lock()
  defer s.mu.Unlock()

  if s.m == nil {
    s.m = make(map[string]*Object)
  }

  if obj, ok := s.m[key]; ok {
    return obj, nil
  }
  obj, err := newObject(key)  // BAD!
  if err != nil {
    return nil, err
  }
  s.m[key] = obj
  return obj, nil
}

To illustrate the problem, let's say a store already has "bar" but not "foo" saved in the map. Thread-A calls s.GetOrCreate("foo") which necessitates creating a new object for "foo". Meanwhile, Thread-B comes along and simply wants to call s.GetOrCreate("bar") which already exists. Thread-B should be able to just get "bar" without waiting for Thread-A to create "foo", but the lock held by Thread-A during newObject("foo") forces Thread-B to wait.

This is particularly problematic when the object creation takes on the order of seconds which is typical if the object represents a remote network resource, and that is a common use case for Get-Or-Create maps.

Generally speaking, holding a lock while doing blocking operations will severely limit the scalability of a service, so any lock should be confined to the scope of reading or writing an in-memory data structure.

However, Get-Or-Create maps does have to wait in some cases. Even so, we can separate the "fast path" from the "slow path" and only have to wait in the slow path. The vast majority of accesses should fall under the fast path which does not wait for blocking operations.

Consider another Thread-C which also wants to get "foo" while Thread-A is creating it. Thread-C should wait for Thread-A. If Thread-C goes ahead and creates another instance of "foo", then both Thread-A and Thread-C will end up with an object for "foo", and we now have a problem: which "foo" do we put into the map now? For this reason, the map needs a third state to indicate that an object is being created although not ready.

The right mechanism is to use a condition variable in conjunction with a mutex, so that Thread-A could release the lock while creating the new object, Thread-B could retrieve an existing object without blocking, and Thread-C could wait for object creation by Thread-A.

type Store struct {
  mu sync.Mutex
  c sync.Cond
  m map[string]*Object
}

func (s *Store) GetOrCreate(key string) (*Object, error) {
  s.mu.Lock()
  defer s.mu.Unlock()

  if s.m == nil {
    s.m = make(map[string]*Object)
  }

  obj, ok := s.m[key]
  for ok && obj == nil {  // Object exists but it is still being created.
    s.c.Wait()            // Wait for its creation.
    obj, ok = s.m[key]    // And try again.
  }
  if ok && obj != nil {
    return obj
  }

  s.m[key] = nil  // Assume responsibility for creating the object.

  s.mu.Unlock()
  obj, err := newObject(key)  // Avoid holding the lock for lengthy operations.
  s.mu.Lock()

  if err != nil {
    delete(s.m, key)  // Object is no longer being created.  Next thread will retry.
  } else {
    s.m[key] = obj    // Creation successful.  Next thread will pick this up.
  }
  s.c.Broadcast()     // Wake up everyone waiting one-by-one.
  return obj, err
}

When Thread-A is about to create a new object, it puts a nil value into the map to indicate that an object is being created but not available. Thread-C will lock the mutex and see that the value is nil, and will wait on the conditional variable (which releases the mutex while waiting). When Thread-A finishes object creation, it locks the mutex again, puts the object into the map, broadcasts a signal through the condition variable, and unlocks the mutex. This will wake up Thread-C to try again. Thread-B is able to obtain "bar" without blocking because the mutex is not held by either Thread-A or Thread-C.

However, if Thread-A is unable to create the object for any reason, it will clear the pending status, still wake up all the waiting threads, and return the error. Thread-C will see that the nil value has been removed from the map and then assume the responsibility for attempting to create a new object.