Monday, April 11, 2022

The Correct Way to Lock a Get-Or-Create Map

A Get-Or-Create map is useful for the "Multiton" (mutiple-singleton) design pattern. Locking is used to make it thread-safe. It may not seem obvious, but the straightforward way to lock a map with Get-Or-Create semantics is problematic.

// Store keeps track of all the objects created so they can be shared.
type Store struct {
  mu sync.Mutex
  m map[string]*Object
}

func (s *Store) GetOrCreate(key string) (*Object, error) {
  s.mu.Lock()
  defer s.mu.Unlock()

  if s.m == nil {
    s.m = make(map[string]*Object)
  }

  if obj, ok := s.m[key]; ok {
    return obj, nil
  }
  obj, err := newObject(key)  // BAD!
  if err != nil {
    return nil, err
  }
  s.m[key] = obj
  return obj, nil
}

To illustrate the problem, let's say a store already has "bar" but not "foo" saved in the map. Thread-A calls s.GetOrCreate("foo") which necessitates creating a new object for "foo". Meanwhile, Thread-B comes along and simply wants to call s.GetOrCreate("bar") which already exists. Thread-B should be able to just get "bar" without waiting for Thread-A to create "foo", but the lock held by Thread-A during newObject("foo") forces Thread-B to wait.

This is particularly problematic when the object creation takes on the order of seconds which is typical if the object represents a remote network resource, and that is a common use case for Get-Or-Create maps. Holding a lock while doing blocking operations will severely limit the scalability of a service.

Generally speaking, the lock should be confined to the scope of reading or writing an in-memory data structure. However, Get-Or-Create maps does have to wait in some cases.

Consider another Thread-C which also wants to get "foo" while Thread-A is creating it. Thread-C should wait for Thread-A. If Thread-C goes ahead and creates another instance of "foo", then both Thread-A and Thread-C will end up with an object for "foo", and we now have a problem: which "foo" do we put into the map now? For this reason, the map needs a third state to indicate that an object is being created although not ready.

The right mechanism is to use a condition variable in conjunction with a mutex, so that Thread-A could release the lock while creating the new object, Thread-B could retrieve an existing object without blocking, and Thread-C could wait for object creation by Thread-A.

type Store struct {
  mu sync.Mutex
  c sync.Cond
  m map[string]*Object
}

func (s *Store) GetOrCreate(key string) (*Object, error) {
  s.mu.Lock()
  defer s.mu.Unlock()

  if s.m == nil {
    s.m = make(map[string]*Object)
  }

  obj, ok := s.m[key]
  for ok && obj == nil {  // Object exists but it is still being created.
    s.c.Wait()            // Wait for its creation.
    obj, ok = s.m[key]    // And try again.
  }
  if ok && obj != nil {
    return obj
  }

  s.m[key] = nil  // Assume responsibility for creating the object.

  s.mu.Unlock()
  obj, err := newObject(key)  // Avoid holding the lock for lengthy operations.
  s.mu.Lock()

  if err != nil {
    delete(s.m, key)  // Object is no longer being created.  Next thread will retry.
  } else {
    s.m[key] = obj    // Creation successful.  Next thread will pick this up.
  }
  s.c.Broadcast()     // Wake up everyone waiting one-by-one.
  return obj, err
}

When Thread-A is about to create a new object, it puts a nil value into the map to indicate that an object is being created but not available. Thread-C will lock the mutex and see that the value is nil, and will wait on the conditional variable (which releases the mutex while waiting). When Thread-A finishes object creation, it locks the mutex again, puts the object into the map, broadcasts a signal through the condition variable, and unlocks the mutex. This will wake up Thread-C to try again. Thread-B is able to obtain "bar" without blocking because the mutex is not held by either Thread-A or Thread-C.

However, if Thread-A is unable to create the object for any reason, it will clear the pending status, still wake up all the waiting threads, and return the error. Thread-C will see that the nil value has been removed from the map and then assume the responsibility for attempting to create a new object.