The generic unlock() wrote to m_write_locked from every thread
regardless of whether a read or write lock was held. When multiple
threads held concurrent read locks, their unlock() calls would race
on the non-atomic m_write_locked and m_read_locked_with_write_lock
fields.
Split unlock() into unlock_read() and unlock_write() so that read
unlocks never touch the write-lock tracking fields. The RWLockLocker
template dispatches at compile time based on LockMode.