Handling mutable data in a multi-threaded high traffic environments is always error-prone. Shared mutable data across various threads, can change the service behavior in weird ways, leaving undetectable and irreproducible bugs.
One such case was discovered a couple of days ago at my workplace, and interestingly, the bug was there for almost 4 years, and nobody noticed it. Which also means that the bug was not impacting the business much, thankfully. Or else, it would have made a lot of noise right when the bug got introduced.
The Symptoms
Bad data is returned randomly from service fleet
Some hosts return correct data, some not
Bad hosts when restarted, start behaving correctly
Once a host returns bad data, it always returns bad data
Issue diagnosis
From the first 3 symptoms, we can think of issues due to multi-threading with mutable data, like a cache. But the 4th one suggests that it could be something related to static initialization, or a singleton with mutable data. Because, with multi-threading concurrency issues, the occurrence of the issue is still expected to be random. And with cache, generally when the TTL gets over, it should auto-correct itself.
Since a bad host stays permanently damaged, it indicates towards a singleton or static object which is supposed to stay untouched, but somehow got changed.
Consider this:
public enum DaysGroup {
WeekDays(
Stream.of("Mon", "Tue", "Wed", "Thu", "Fri")
.collect(Collectors.toSet())
),
Weekends(Stream.of("Sat", "Sun")
.collect(Collectors.toSet())
)
private Set<String> days;
DaysGroup(Set<String> days) {
this.days = days;
}
public Set<String> getDays() {
return this.days;
}
}
The above defines an enum
which has a set of strings that gives meaning to the enums. The set is stored as an enum instance variable.
It is worth noting that enums are singleton by design, and hence there will always be a single copy of the defined enums : WeekDays
and Weekends
With this defined, consider a service code portion:
...
Set<String> days = WeekDays.getDays();
if (condition) {
days.remove("Fri");
}
...
This in the first look, seems okay. But, if seen with a slight detail, we notice that if the condition is true, the actual set from the enum is altered.
So, whats the issue?
The problem is that the enum is changed permanently, and next time if the code is run, or any other thread accesses it in parallel, it will never find Fri
in the set anymore.
The problem is obviously temporarily fixed if the service is restarted, and it will stay fixed till that condition is satisfied again.
The problem also doesn’t occur enough if that condition is infrequent. Like, condition == is this year a leap-year.
The problem will occur only in those machines where this condition was ever met. So, if you have a 100 machine fleet for your service, and only 2 of them ever got this code path executed, then those 2 are broken, not the rest. So, the impact of it stays pretty unpredictable.
The Fix(es)
Return a copy
One way to fix this is to return a copy of the set, instead of the set itself.
public Set<String> getDays() {
return new HashSet<>(this.days);
}
This works. But if this method is called at many places, and mostly they just read the items and dont change anything, then we are unnecessarily creating quite a lot of memory usage.
Keep an immutable set and return that
The other way is not to keep a mutableSet, but rather keep an immutable set as part of the enum construction.
DaysGroup(Set<String> days) {
this.days = ImmutableSet.copyOf(days);
}
This will ensure that the clients of this code will handle the returned set as they see fit.
The above uses Guava’s ImmutableSet. In modern JDKs, this can simply be replaced with Set.of(…)
.
Preventions
The above was preventable with proper static analysis in place. For example, SpotBugs provides out of the box support to check such cases.
Best practices in code reviews can avoid such issues leaking to the prod.
Some elaborate Junit tests can capture such errors.
Use modern JDK/Kotlin to use immutability as first class construct rather than having the need to use external libraries or doing something special to get the immutability niceties.