Monday, April 11, 2022

The Correct Way to Lock a Get-Or-Create Map

A Get-Or-Create map is useful for the "Multiton" (mutiple-singleton) design pattern. Locking is used to make it thread-safe. It may not seem obvious, but the straightforward way to lock a map with Get-Or-Create semantics is problematic.

// Store keeps track of all the objects created so they can be shared.
type Store struct {
  mu sync.Mutex
  m map[string]*Object
}

func (s *Store) GetOrCreate(key string) (*Object, error) {
  s.mu.Lock()
  defer s.mu.Unlock()

  if s.m == nil {
    s.m = make(map[string]*Object)
  }

  if obj, ok := s.m[key]; ok {
    return obj, nil
  }
  obj, err := newObject(key)  // BAD!
  if err != nil {
    return nil, err
  }
  s.m[key] = obj
  return obj, nil
}

To illustrate the problem, let's say a store already has "bar" but not "foo" saved in the map. Thread-A calls s.GetOrCreate("foo") which necessitates creating a new object for "foo". Meanwhile, Thread-B comes along and simply wants to call s.GetOrCreate("bar") which already exists. Thread-B should be able to just get "bar" without waiting for Thread-A to create "foo", but the lock held by Thread-A during newObject("foo") forces Thread-B to wait.

This is particularly problematic when the object creation takes on the order of seconds which is typical if the object represents a remote network resource, and that is a common use case for Get-Or-Create maps. Holding a lock while doing blocking operations will severely limit the scalability of a service.

Generally speaking, the lock should be confined to the scope of reading or writing an in-memory data structure. However, Get-Or-Create maps does have to wait in some cases.

Consider another Thread-C which also wants to get "foo" while Thread-A is creating it. Thread-C should wait for Thread-A. If Thread-C goes ahead and creates another instance of "foo", then both Thread-A and Thread-C will end up with an object for "foo", and we now have a problem: which "foo" do we put into the map now? For this reason, the map needs a third state to indicate that an object is being created although not ready.

The right mechanism is to use a condition variable in conjunction with a mutex, so that Thread-A could release the lock while creating the new object, Thread-B could retrieve an existing object without blocking, and Thread-C could wait for object creation by Thread-A.

type Store struct {
  mu sync.Mutex
  c sync.Cond
  m map[string]*Object
}

func (s *Store) GetOrCreate(key string) (*Object, error) {
  s.mu.Lock()
  defer s.mu.Unlock()

  if s.m == nil {
    s.m = make(map[string]*Object)
  }

  obj, ok := s.m[key]
  for ok && obj == nil {  // Object exists but it is still being created.
    s.c.Wait()            // Wait for its creation.
    obj, ok = s.m[key]    // And try again.
  }
  if ok && obj != nil {
    return obj
  }

  s.m[key] = nil  // Assume responsibility for creating the object.

  s.mu.Unlock()
  obj, err := newObject(key)  // Avoid holding the lock for lengthy operations.
  s.mu.Lock()

  if err != nil {
    delete(s.m, key)  // Object is no longer being created.  Next thread will retry.
  } else {
    s.m[key] = obj    // Creation successful.  Next thread will pick this up.
  }
  s.c.Broadcast()     // Wake up everyone waiting one-by-one.
  return obj, err
}

When Thread-A is about to create a new object, it puts a nil value into the map to indicate that an object is being created but not available. Thread-C will lock the mutex and see that the value is nil, and will wait on the conditional variable (which releases the mutex while waiting). When Thread-A finishes object creation, it locks the mutex again, puts the object into the map, broadcasts a signal through the condition variable, and unlocks the mutex. This will wake up Thread-C to try again. Thread-B is able to obtain "bar" without blocking because the mutex is not held by either Thread-A or Thread-C.

However, if Thread-A is unable to create the object for any reason, it will clear the pending status, still wake up all the waiting threads, and return the error. Thread-C will see that the nil value has been removed from the map and then assume the responsibility for attempting to create a new object.

Monday, March 28, 2022

Self-Driving Car Challenges

Mercedes recently announced level 3 self-driving that takes legal liability. Typically, the levels of autonomy are defined by situations where the car is able to operate without human intervention, but being able to assume legal liability is a huge step forward. Levels of autonomy would only remain a marketing slogan unless it can be tested in court. The cost of liability will be factored into the insurance premium, so self-driving only becomes economically viable if the insurance premium can be lower than that of a human driver.

However, Mercedes' legal responsibility comes with limits: on certain (already-mapped) highways, below 40 mph, during daytime, in reasonably clear weather, without overhead obstructions.

The first challenge that any camera based system must overcome is the stability of the footage. You can see from this test that a car mounted camera with improper image stabilization (circa 2016) would produce a wobbly and shaky image. The wobble can be counteracted by shortening exposure time (high camera shutter speed) but this requires good lighting condition, hence the reasonably clear weather requirement. Furthermore, when the car is traveling fast, you need a telephoto lens to look further ahead for hazardous conditions, but longer telephoto also exacerbates the wobble, hence the operating speed limit of the self-driving. If the video footage is bad, it won't help much if you feed this into machine learning because "garbage in, garbage out." More recent cameras such as a GoPro has improved image stabilization (circa 2021) that also works on a toy car (circa 2020) which is more challenging to stabilize. These cameras can produce clean image under more forgiving lighting conditions.

Car manufacturers who are serious about their self-driving should be licensing camera stabilization technology from the likes of GoPro.

Self-driving cars using LIDAR face a different challenge. LIDAR works by sending a short pulse of light and observe reflections, so it is not dependent on external lighting conditions. But when there are multiple LIDAR equipped cars on the road, they could be picking up each other's signals which might seem like noises. As such, a LIDAR has to encode its own unique ID into the light pulse and filter out any pulse not coming from itself.

A third challenge is about how to legally dispute a claim in court. A self-driving system must be able to produce a rationale why it made any given decision, and the decision has to be legally justifiable. Previously, machine learning is a black box that could produce surprising results (it recognized a dumbbell only because it has an arm attached to it), but explainable AI is making some progress. Similarly, self-driving technology must be explainable.

Explainable self-driving technology can be thought of as a driving instructor that happens to be a computer. The driving instructor not only has to make the right decision on the road, but it also has to explain to a student why it is the right decision given the circumstance.

Car manufacturers that want to assume legal responsibility should aim for a computerized driving instructor instead.

A musician would understand that in order to perform at 100%, they must practice up to 120%. This mindset applies to a lot of engineering problems where safety is at stake. Building structures such as elevators are often designed with load considerations at ~200% rated capacity or more (ASCE/SEI 7-10 1.4.1.1 (b)). When it comes to self-driving, the specs are less quantifiable, but the design qualifications must similarly exceed expectation in order for the technology to be viable.