Mutable collections are hard: List.InsertRange
Quick question: given the following list:
List<int> myList = [1, 2, 3, 4];
What does the following code produce?:
// A:
myList.InsertRange(2, myList);
and how about this?:
// B:
myList.InsertRange(2, myList.AsReadOnly());
or this one?:
// C:
myList.InsertRange(2, myList.Where(_ => true));
I'll assure you:
- All of these snippets call out to exactly the same
InsertRange(int, IEnumerable<T>)
method. - The
IEnumerable<T>
arguments passed toInsertRange
produce exactly the same sequences, albeit through increasingly convoluted mechanisms.
So, all examples call the same method with effectively the same argument. Surely, regardless of what the outcome ought to be, they will all be the same, right?
Wrong!
- Example A produces a list of eight elements:
[1, 2, 1, 2, 3, 4, 3, 4]
. - Example B also produces a list of eight elements, but with garbage content:
[1, 2, 1, 2, 0, 0, 3, 4]
. Notice the zeroes? Where did they come from? - Example C throws an
InvalidOperationException
but not before leaving behind a list of five (!) elements:[1, 2, 1, 3, 4]
AFAIK, this behavior isn't documented anywhere.
Out of these results, example B is clearly the worst offender: it corrupts the list without letting us know that anything went wrong. Example C is somewhat better, in that it at least throws an exception, but it still leaves the list in an inconsistent state. If the list continues to be used after the exception has been caught this could still lead to hard-to-debug issues.
Why?
As always, the source code sheds some light. There are three distinct code paths in the InsertRange
method:
if (this == c)
: For the case when the collection being inserted is exactly the same reference as the list itself.if (collection is ICollection<T> c)
: For the case when the provided enumerable implementsICollection<T>
. So that it can be copied using a single virtual method dispatch toCopyTo
.else
: For all other cases, the method falls back to the slowest possible way of copying the elements: enumerating the collection and inserting them one by one.
This explains the results from above:
- Example A succeeds because this is deliberately special cased.
- Example B produces incorrect results because the
AsReadOnly
method returns aReadOnlyCollection<T>
which does implementICollection<T>
but it is not exactly the same reference as the original list. TheReadOnlyCollection<T>.CopyTo
implementation forwards the call to the underlying list'sCopyTo
method, which ends up overwriting the list's backing array with itself, exposing uninitialized data. - Example C fails because it falls back to the List's
IEnumerator<T>
implementation, which is designed to fail after it notices the list has been modified. This is the infamous "Collection was modified; enumeration operation may not execute" exception you may have encountered in other contexts.
Conclusion
There are a couple of takeaways here:
- Implementing
IEnumerable<T>
on mutable data structures is hard :). Or at least: doing it correctly is. - Consider adding explicit checks or safeguards in your code to handle cases where a mutable data structure might be modified by itself, to prevent subtle bugs and data corruption.
- .NET ships with a lot of different ways to create & compose enumerables. When comparing two
IEnumerable<T>
instances, be aware that while reference equality implies an identical underlying source, reference inequality does not imply the opposite.
How this impacts Badeend.ValueCollections
I came across this issue while implementing the ValueList<T>.Builder
type in my ValueCollections library. I chose to err on the side of caution and always throw an exception when attempting to mutate a collection with itself. If anyone ever needs it, I'm open to adding an .InsertSelf(int at)
-like method, but so far I've never needed it.