by ctaglia on 12/11/13, 2:39 PM with 59 comments
by nly on 12/11/13, 3:39 PM
One interesting corollary is that moving short strings in an implementation that does this could actually be ever so slightly (negligibly) slower than moving long ones (since byte copies are slower than word copies). But generally, this is a free lunch optimisation and can save you hundreds of megs of memory when writing programs dealing with millions of short strings.
[1] http://llvm.org/svn/llvm-project/libcxx/trunk/include/string - search for "union"
by Someone on 12/11/13, 3:15 PM
Also (pedantic):
#define RSTRING_EMBED_LEN_MAX ((int)((sizeof(VALUE)*3)/sizeof(char)-1))
sizeof(char) is always 1, so that division is superfluous.by danielweber on 12/11/13, 3:21 PM
by ben0x539 on 12/11/13, 3:21 PM
by ra88it on 12/11/13, 4:24 PM
Conclusion: "Don’t worry! I don’t think you should refactor all your code to be sure you have strings of length 23 or less."
by spoiler on 12/11/13, 3:40 PM
by anon4 on 12/11/13, 5:12 PM
struct RString {
struct RBasic basic;
union {
struct {
long len;
char *ptr;
union {
long capa;
VALUE shared;
} aux;
} heap;
char ary[];
} as;
};
/* apologies if I messed up the syntax here */
#define RSTRING_EMBED_LEN_MAX (sizeof(((RString*)(0))->as) - 1)
Then you can even use the padding the compiler added, if any, plus you can add more things to heap and the embed length will grow automatically.by markburns on 12/11/13, 3:21 PM
by gaius on 12/11/13, 3:47 PM
by pedrocr on 12/11/13, 3:15 PM
by grosbisou on 12/11/13, 3:28 PM
VALUE seems to be unsigned int defined via "typedef uintptr_t VALUE;" and "typedef unsigned __int64 uintptr_t;"
But why is it calculated like that I don't get. Anyone can explain?
by gesman on 12/11/13, 4:11 PM
When programmers don't know in advance how long name/email/input/whatever field is going to be - they just use the magic "power of two" length :)
So 32 (or 33) in this case would be more reasonable.
by badman_ting on 12/11/13, 3:18 PM
by throwaway0094 on 12/11/13, 3:48 PM
by jokoon on 12/11/13, 3:48 PM
by drakaal on 12/11/13, 4:28 PM
by corresation on 12/11/13, 4:11 PM
If this is intended to sit on the stack, which I find highly unlikely (especially given the timings that seem to be the delta between one malloc and two, and would be much more significant if it were a stack allocation versus a heap allocation. This is not comparable to small string optimizations for the stack in C++), maybe. But otherwise it seems like a poorly considered hack.
The string type could as easily have been dynamically allocated based upon the length of the string, where the ptr by default points inside that same allocated block. If the string is expanded it can then be realloced and the string alloced somewhere else. No waste, a single allocation, etc.