I am talking about modern, or slightly dated-but-easy-to-implement alternatives to C string, like for example, the pointer+length encoding method in Rust, (which is also called record method, I think?), or the Pascal string method.
Another alternative I’ve seen is strings that are not null terminated but where the allocated memory actually begins at ptr[-1] and contains the length of the string. The benefit is that you still get a char array starting at ptr[0].
You answered your own question. Strings with length are better than null terminated. It is a mistake in the original C language library and probably a hack because the pdp11 used asciz format.
Lower performance though. At each iteration through the string you need to compare the length with a counter, which if you want strings longer than 255 characters will have to be multibyte. With NTS you don’t need the counter or the multibyte comparison, strings can be indefinitely long, and you only need to check if the byte you just looked at is zero, which most CPUs do for free so you just use a branch-if-[not-]zero instruction.
The terminating null also gives you a fairly obvious visual clue where the end of the string is when you’re debugging with a memory dump. Can you tell where the end of this string is: “ABCDEFGH”? What about now: “ABCD\0EFGH”?
Fixed length strings. You can only ever have strings of a particular length, no more, no less. No need to store the string length nor terminating characters. 🤓
How would the machine know where the string would stop, since a string could contain literally any character?
But yeah… a
.text
section would be an alternative.I am talking about modern, or slightly dated-but-easy-to-implement alternatives to C string, like for example, the pointer+length encoding method in Rust, (which is also called record method, I think?), or the Pascal string method.
Another alternative I’ve seen is strings that are not null terminated but where the allocated memory actually begins at ptr[-1] and contains the length of the string. The benefit is that you still get a char array starting at ptr[0].
But wouldn’t this be potentially unsafe? What programming language has this type of implementation, by the way?
Hmm I think I saw it in a C library
Edit: Might have been this one https://github.com/msteinert/bstring
Edit: actually seems it’s this one. Look at what happens to ystr_header_t https://github.com/Amaury/Ylib/blob/master/src/ystr.c
You answered your own question. Strings with length are better than null terminated. It is a mistake in the original C language library and probably a hack because the pdp11 used asciz format.
Lower performance though. At each iteration through the string you need to compare the length with a counter, which if you want strings longer than 255 characters will have to be multibyte. With NTS you don’t need the counter or the multibyte comparison, strings can be indefinitely long, and you only need to check if the byte you just looked at is zero, which most CPUs do for free so you just use a branch-if-[not-]zero instruction.
The terminating null also gives you a fairly obvious visual clue where the end of the string is when you’re debugging with a memory dump. Can you tell where the end of this string is: “ABCDEFGH”? What about now: “ABCD\0EFGH”?
@DmMacniel @velox_vulnus
Fixed length strings. You can only ever have strings of a particular length, no more, no less. No need to store the string length nor terminating characters. 🤓