.net core 6 String Improvements

It is an inalienable fact of software engineering that Strings are immutable. Every time you “mutate” a string, a new one is created.

If for example, you want to concatenate two strings, the result will be that within memory, the first string, second string, and resulting string will all exist

Even if you don’t need the original, and you have something more like

within memory, you will still have all 3 strings.

As a note, += on a string will actually compile into a string.Concat which doesn’t change anything, but i just think its really neat. Sharplab.io example

 

So if you were to do something like

you will end up having the string in its original form. And then each word will be allocated in memory once more. Therefore doubling your memory footprint.

Its made even worse when you start splitting splits

within this example each value ends up existing as strings in the original s and then once again in the array pairs (sans |) and then once again in split (sans ,)

This means that if you want to write really good code, you have to do things like read character by character, and write your own state machine to parse out the string.

And ultimately, there’s no reason anyone would want to write fancy fast code when they can instead write something they can come back to 6 months later and understand.

 

Enter .net core string improvements.

in .net core, there have been some amazing improvements where you can use Span in order to have a sort of safe way of referring to memory.

Ultimately a Span is essentially like having pointers to a chunk of memory. Its a lot like in C++ where you could just have a pointer to somewhere in memory and then interpret that memory. But instead its safe! sort of. You always have to accept the risks associated with memory access. Something else could change the memory out from under you. You could assume something is 4 bytes long, but something changes and now they’re 8. etc.

However, when you combine the immutable nature of a string with spans, something magical happens.

In .net core, microsoft introduced a string constructor that accepts a ReadOnlySpan

 

This means that when you split strings, or do certain other string operations, it can now reference the original immutable string’s memory.

Because both things referencing the memory are ultimately immutable, the operation remains safe, and quite a bit faster.

 

Within our code, we have a string of the format key1, key2|value1|value2 which would result in a Dictionary<string, List<string>>

or in this case something like

but imagine a much much longer string.

within our code, we had 3 possible ways of parsing.

  • String.Split – previously unthinkable because of the amount of memory usage, and performance
  • Regular Expressions – not very readable, but a lot shorter than a state machine
  • StringReader and a manually implemented state machine

Obviously, we went with the StringReader in the past. But when you look at the benchmarks now, its a whole new world for code maintainability.

 

The regex is clearly awful. Its slow, and allocates significantly more memory.

String.Split and the StringReader are within a margin of error. And String.Split allocated significantly less memory.

Leave a Comment

Your email address will not be published. Required fields are marked *