Refactoring locks into sharing immutables
So if both Eric Lippert and Jon Skeet think lock free programming is only for people smarter than themselves, then I will humbly run away screaming from the idea immediately. – dodgy_coder – SO
There is some truth in this quote. Lock-free is a quite advance subject, which includes knowing compiler optimisations and CPU specifics. However, there are some techniques that can be considered by programmers who prefer more high-level constructions. One of them is described in this article.
I start from the example of the repository, typical architectural block of C# programs. The repository is used to store the data of some kind. Users is quite common data entries, let's use it for the this example. User
has Name
, Phone
, and Address
. The PhoneBookService
class queries the repository for the phone of the user by performing a non-strict search over the list of users. To support this, the repository provides the search function.
class User {
public string Name { get; set; }
public string Phone { get; set; }
public string Address { get; set; }
}
class UserRepository {
private readonly List<User> _users = new List<User>();
public void Add(User user) {
lock (_users) {
var index = _users.FindIndex(u => u.Name == user.Name);
if (index != -1) _users.RemoveAt(index);
_users.Add(user);
}
}
public IReadOnlyCollection<User> Search(string criteria) {
lock (_users) {
return _users.Where(u => u.Name.Contains(criteria)).ToList();
}
}
}
Bodies of the Add
and Search
methods are wrapped in the lock
statements to prevent modifying the collection when it is enumerated. So the class UserRepository
is thread-safe, i.e. can be used safely from multiple threads.
Now different threads can write to the repository or read from it without danger of damaging the internal state of the underlying collection.
The PhoneBook service provides method to lookup the phone by name. For this it calls the search method of the repository and returns the matches.
class PhoneBookEntry {
public string Name { get; set; }
public string Phone { get; set; }
}
class PhoneBookService {
private readonly UserRepository _userRepository;
public PhoneBookService(UserRepository userRepository) {
_userRepository = userRepository;
}
public IReadOnlyCollection<PhoneBookEntry> Search(string criteria) {
return _userRepository.Search(criteria)
.Select(user => new PhoneBookEntry { Name = user.Name, Phone = user.Phone })
.ToList();
}
}
There are several PhoneBookService
instances that are serving the requests from the clients in many threads for updating the repository and searching the phones, but mainly searching.
public static void Main(string[] args) {
var r = new UserRepository();
var cts = new CancellationTokenSource();
new Thread(parameter => {
var rnd = new Random();
var data = (Tuple<UserRepository, CancellationToken>) parameter;
var repository = data.Item1;
var ct = data.Item2;
while (!ct.IsCancellationRequested) {
repository.Add(new User { Name = $"N{rnd.Next(100)}", Phone = "..." });
Thread.Sleep(4);
}
}).Start(Tuple.Create(r, cts.Token));
for (var i = 0; i < threads; i++) {
new Thread(parameter => {
var rnd = new Random();
var data = (Tuple<PhoneBookService, CancellationToken>) parameter;
var service = data.Item1;
var ct = data.Item2;
while (!ct.IsCancellationRequested) {
service.Search($"{rnd.Next(100)}");
}
}).Start(Tuple.Create(new PhoneBookService(r), cts.Token));
}
Thread.Sleep(5000);
cts.Cancel();
}
There is some problem, however - lock
. It is easy to see that only one thread can read from the repository every single time, so it is a bottleneck. The lock in the Search
makes the searches, in fact, parallel, and the lock in the Search
cannot be removed - in this case the enumeration can happen when the collection is being modifying from another thread (and so the list is in some inconsistent state). Mutable memory should not be shared without protecting it during writes.
But immutable memory can! And this will help to remove the lock in the method Search
and make it scallable in exchange to more complicated code and memory. Instead of sharing the repository and querying it directly, let's share the token that encapsulates the repository state and search over it.
interface IUserRepositoryStateSubscriber {
void Update(UserRepositoryState state);
}
class UserRepositoryState {
public UserRepositoryState(IReadOnlyList<User> users) {
Users = users;
}
internal IReadOnlyList<User> Users { get; }
}
class UserRepository {
private readonly object _syncLock = new object();
private readonly List<IUserRepositoryStateSubscriber> _subscribers =
new List<IUserRepositoryStateSubscriber>();
private List<User> _users = new List<User>();
public void Subscribe(IUserRepositoryStateSubscriber subscriber) {
_subscribers.Add(subscriber);
}
public void Add(User user) {
lock (_syncLock) {
var newUsers = new List<User>(_users);
var index = newUsers.FindIndex(u => u.Name == user.Name);
if (index != -1) newUsers.RemoveAt(index);
newUsers.Add(user);
_users = newUsers;
var state = new UserRepositoryState(newUsers);
foreach (var subscriber in _subscribers)
subscriber.Update(state);
}
}
public IReadOnlyCollection<User> Search(UserRepositoryState state, string criteria) {
var list = state?.Users.Where(u => u.Name.Contains(criteria)).ToList();
return list as IReadOnlyCollection<User> ?? new User[0];
}
}
class PhoneBookService : IUserRepositoryStateSubscriber {
private readonly UserRepository _userRepository;
private volatile UserRepositoryState _userRepositoryState;
public PhoneBookService(UserRepository userRepository) {
_userRepository = userRepository;
_userRepository.Subscribe(this);
}
public IReadOnlyCollection<PhoneBookEntry> Search(string criteria) {
return _userRepository.Search(_userRepositoryState, criteria)
.Select(user => new PhoneBookEntry { Name = user.Name, Phone = user.Phone })
.ToList();
}
void IUserRepositoryStateSubscriber.Update(UserRepositoryState state) {
_userRepositoryState = state;
}
}
In both versions the search happens in the current thread. But second version does not require lock because it always search over the immutable collection. It scales linearly when the number of threads and CPUs increase, while the first version actually may become even slower with higher number of threads.
Let's chech the performance of two versions of the code, with lock and without, on the AWS EC2 computational instance with 36 CPUs (c4.8xlarge). In the test code there is one thread that writes to the repository and many (from 1 to 40) threads that read from it.
Although it looks pretty cool, it is not a silver bullet in any way. The following should be considered before applying this pattern:
- Is the shared data small?
- Is the update/read ratio low?
- Is the lock affects performance significantly?
- Is the competence of the team is good enough to understand and support this code?
If all the answers are Yes, then it is definitely worth trying to refactor the code in the way described above.
References: