Memory layout of VB6 strings
To get some background, let's see how strings are stored in
RAM. VB6 stores strings in Unicode format. In COM terminology, a
VB String is a BSTR. A String requires six overhead bytes
plus 2 bytes for each character. Thus, you spend
6 + Len(string)*2 bytes for each string.
The string starts with a 4-byte length prefix for the size of
the string. It's not the character length, though. This
32-bit integer namely counts the number of bytes in the
string (not counting the terminating 2 zero bytes). After the
length prefix comes the actual text data, 2 bytes for each
character. The last 2 bytes are zeros, denoting a NULL
terminator (a Unicode null character).
NULL character note. It is perfectly valid
to store a NULL character in the string. As the string length is
stored in the first 4 bytes, you can store NULLs in the string
data. They will not be treated as the terminating NULL as in
C/C++.
Performance of simple string functions
We are now going to measure how fast the various built-in
string functions of VB6 really are. For this purpose we compiled
a little .exe, which called each function 100 million times. The
test was run on a typical Pentium 4 processor (2.8 GHz). We
found out that Len and LenB are the fastest functions. So we
compared the rest of the functions to Len/LenB and put the
results in the chart and table below.

Relative performance of VB6 string functions
| Function |
Performance |
Description |
| Len |
1 |
Len(S) returns the number of characters in
string S.
(Note: works differently with UDTs) |
| LenB |
1 |
LenB(S) returns the number of bytes in S. |
| AscW |
1 |
AscW(S) returns the Unicode value of the
first character in S. |
| AscB |
1 |
AscB(S) returns the first byte in S. |
| Asc |
7 |
Asc(S) returns the ANSI value of the first
character in S. |
| ChrW$ |
17 |
ChrW$(U) returns a string containing the Unicode
character U. |
| ChrB$ |
18 |
ChrB$(B) returns a byte string containing
the character B. |
| Chr$ |
24 |
Chr$(A) returns a string containing the ANSI
character A. |
| Left$ |
18 |
Left$(S,x) returns x characters from the start
of S. |
| Right$ |
21 |
Right$(S,x) returns x characters from the end
of S. |
| Mid$ |
24 |
Mid$(S,x,y) returns y characters from S, starting at
the x'th character. |
| CStr |
36 |
CStr(x) returns the string representation of x
(localized). |
| Str$ |
111 |
Str$(x) returns the string representation of x (not
localized). |
How to read the table: Calling Len takes 1 unit of time,
while Asc takes 7 units. You can call Len 7 times in the same
time you call Asc once.
Len and LenB. The fastest functions are Len
and LenB. These are lightning fast functions that simply read
the 2 length bytes at the start of the string area. Len is
implemented in 6 assembly instructions in the VB runtime. LenB
is even shorter: it runs just 5 instructions. In principle, LenB
should run faster. In practice, this is not the case. Their
performance is equal on today's processors.
AscW, AscB and Asc. This group of functions
is very fast as well. Note how Asc takes 7 times the time of
AscW and AscB. This is because AscW and AscB simply return the
first byte(s) of the string. Asc needs to convert the value to
an ANSI character code. You can squeeze more performance out of
your program by replacing Asc with AscW. Since Asc and AscW
return different values for many characters (other than ASCII
0-127), you need to know what you are doing before choosing the
other function.
ChrW$, ChrB$ and Chr$. Performance degrades
as we move to functions that create strings. This group of
functions creates a string out of a numeric character code. Note
how Chr$ takes about 40% more time to run than ChrW$. This is
because Chr$ converts from ANSI to Unicode while ChrW$ works
with pure Unicode values. You can squeeze more performance out
of your program by replacing Chr$ with ChrW$. Since they return
different characters for many values (other than 0-127), you
need to know what you are doing before choosing the other
function.
Left$, Right$ and Mid$. Performance keeps at
the degraded level with this group of functions. These functions
create new strings by copying some characters in the input
string. These are the only functions that can access the
individual characters in a string. As you can see, Mid$ is
slower than Left$ or Right$. This means you should use Left$ and
Right$ when possible and only resort to Mid$ when you you really
need to access characters in the middle.
Tip. To access the first character in a string, call AscW(S)
instead of Left$(S,1). This will save you 95% of the time of
running Left$. There is a caveat, though. S must not be
empty. If S is empty, you will trigger a run-time error.
Therefore, you need to make sure S is not empty by calling Len(S)
or LenB(S) first. This is the proper way to do it: If
LenB(S) Then x = AscW(S)
CStr and Str$. These slow functions are used
to convert other data types to a string. You typically use them
to convert a numeric value into a string. CStr is much faster
than Str$. (Tested for integer input value 32.)
You can save time by replacing calls to Str$ with CStr. This
is not a straightforward task, though, because CStr and Str$
return different values. CStr returns a localized string, while
Str$ returns a non-localized one. What is more, Str$ prefixes
positive values with a space. As an example, CStr(1.2) returns
"1,2" in several European locales. Str$(1.2) always returns "
1.2". Thus, you can trust that Str$ always works the same way,
while CStr works differently in different locales. If you simply
replace calls to Str$ with CStr, your program may fail later if
it fails to interpret the resulting localized string. The
following table compares Str and CStr in the Finnish locale. The
results will look similar in several non-English locales.
Str$ and CStr in Finnish locale
| x |
Str$(x) |
CStr(x) |
| 1.2 |
" 1.2" |
"1,2" |
| .2 |
" .2" |
"0,2" |
| -.2 |
"-.2" |
"-0,2" |
Performance of string functions, group 2
The next group consists of functions whose performance vary
based on what you feed them as input. Generally speaking, the
longer the input string, the slower the call. Performance drops
considerably with longer strings.
The following chart reports timings for two input strings S
defined as follows:
- Short input
- S consists of 21 characters: 10 spaces, "1" and 10
spaces.
S = " 1 "
- Long input
- S consists of 201 characters: 100 spaces, "1" and 100
spaces.

So, for the call Replace(text), the performance was 513 using
the short input string and 968 using the long input. The table
below presents the numeric performance values for the short
input.
Relative performance of VB6 string functions
| Function |
Performance |
Description |
| LTrim$ |
20 |
LTrim$(S) returns a copy of S without leading
spaces. |
| RTrim$ |
21 |
RTrim$(S) returns a copy of S without trailing
spaces. |
| Trim$ |
29 |
Trim$(S) returns a copy of S without leading or
trailing spaces. |
| Val |
89 |
Val(S) returns the numeric value contained in S
(non-localized). |
| CInt |
81 |
CInt(S) returns the integer value contained in S
(localized, rounded). |
| CDbl |
103 |
CDbl(S) returns the double floating point value
contained in S (localized). |
| LCase$ |
53 |
LCase$(S) returns a copy of S with lower case
letters. |
| UCase$ |
53 |
UCase$(S) returns a copy of S with upper case
letters. |
| StrConv(vbLowerCase) |
327 |
StrConv(S,vbLowerCase) returns a copy of S with
lower case letters. |
| StrConv(vbUpperCase) |
333 |
StrConv(S,vbUpperCase) returns a copy of S with
upper case letters. |
| StrComp(vbBinaryCompare) |
47 |
StrComp(S1,S2,vbBinaryCompare) compares string S1
and S2 based on their Unicode values. |
| StrComp(vbTextCompare) |
65 |
StrComp(S1,S2,vbTextCompare) compares string S1 and
S2 in a locale-dependent, case-insensitive way. |
| InStr(vbBinaryCompare) |
14 |
InStr(x,S1,S2,vbBinaryCompare) looks for S2 in S1,
starting at position x, in a locale-independent,
case-sensitive way. |
| InStr(vbTextCompare) |
156 |
InStr(x,S1,S2,vbTextCompare) looks for S2 in S1,
starting at position x, using text comparison. |
| InStrRev(vbBinaryCompare) |
69 |
InStrRev(S1,S2,x,vbBinaryCompare) looks for S2 in
S1, from position x back to 1, in a
locale-independent, case-sensitive way. |
| InStrRev(vbTextCompare) |
193 |
InStrRev(S1,S2,x,vbTextCompare) looks for S2 in S1,
from position x back to 1, using text comparison. |
| Replace(vbBinaryCompare) |
375 |
Replace(S1,S2,S3,x,y,vbBinaryCompare) replaces S2
with S3 in S1, starting at position x, max y times, in a
locale-independent, case-sensitive way. |
| Replace(vbTextCompare) |
513 |
Replace(S1,S2,S3,x,y,vbBinaryCompare) replaces S2
with S3 in S1, starting at position x, max y times,
using text comparison. |
LTrim$, RTrim$ and Trim$. This useful group
of functions removes spaces from the start or the end of the
input string, or both. The more spaces there are, the slower
they run. These functions return a copy of the input string.
Tip 1. Never nest like LTrim$(RTrim$(S)). Simply call Trim$(S)
instead.
Tip 2. LTrim$ is the fastest alternative to test whether
string S contains any non-space characters. Use this syntax:
Len(LTrim$(S))<>0
Val, CInt, CDbl. This is a group of
relatively slow conversion functions. They look for a number
inside a string. Val is a non-localized version that is best
used together with Str$. CInt and CDbl are localized versions
that are compatible with CStr. Val is a safe function to call,
while CInt and CDbl raise error 13 when conversion fails. — It
is best to avoid costly conversion where possible. Always pass
numeric values in numeric data types, not as strings. By
converting numbers to strings and strings to numbers you spend
extra CPU cycles and also risk error 13 if your code is not
designed the right way.
LCase$, UCase$, StrConv(vbLowerCase),
StrConv(vbUpperCase). This group of functions performs
case conversion. LCase$ is the faster alternative to
StrConv(vbLowerCase). UCase$ is the faster alternative to
StrConv(vbUpperCase).
Keep in mind that LCase$ and UCase$ have full Unicode
support, whereas StrConv(vbLowerCase) and StrConv(vbUpperCase)
don't support all Unicode characters. In fact, calls to
StrConv(vbLowerCase) and StrConv(vbUpperCase) can remove
diacritic marks from Latin characters and convert unsupported
characters to "?". As an example, StrConv(vbLowerCase) converts
all Greek and Cyrillic characters to garbage on a Western
system.
Tip. Avoid StrConv(vbLowerCase) and StrConv(vbUpperCase).
There is no reason to use these slow and limited calls. If you
see them used, replace them with LCase$ or UCase$. If there is a
chance of a Null value in the input, use LCase or UCase, because
the dollar versions LCase$ and UCase$ fail if the input is Null.
StrComp. The StrComp function
compares two strings. The vbBinaryCompare option is
faster than vbTextCompare.
Tip. vbTextCompare should only be used for
sorting. vbTextCompare is often used to test for a
case-insensitive match (test if "ABC" is equal to "abc"), but it
is not suitable for that. This is because the result depends on
the current locale. Certain character combinations can produce
false matches. For example, StrComp("ss", "ß",
vbTextCompare) returns 0. This means "ss" and "ß" are a
match, which may not be what you intended to do. For a real case
insensitive test (without extra effects), use something like
LCase$(S1) = LCase$(S2).
StrComp(vbTextCompare) can produce other surprises as well.
In the following table you can see how StrComp sorts certain
characters in the Finnish locale (as an example). This serves as
a proof of why vbTextCompare is not simply the case
insensitive counterpart of vbBinaryCompare. It
affects much more than just the case.
StrComp examples
vbBinaryCompare
(all locales) |
vbTextCompare
(Finnish locale) |
| a < ae < z < ä < æ |
a < ae = æ < z < ä |
| va < vb < wa |
va < wa < vb |
| AE < ae < Æ < æ |
AE = ae = Æ = æ |
| OE < oe < Œ < œ |
OE = oe = Œ = œ |
| ss < ß |
ss = ß |
| TH < th < Þ < þ |
TH = th = Þ = þ |
| d < e < Ð < ð |
d < Ð = ð < e |
Note that Option Compare Text is the equivalent
of vbTextCompare. Option Compare Binary
is the equivalent of vbBinaryCompare. It is also
the default setting.
InStr. InStr(,,vbBinaryCompare)
is a quick function. Its counterpart InStr(,,vbTextCompare) is
very slow, on the other hand. This is where it really pays off
to use a binary search.
As with StrComp, vbTextCompare with InStr is
full of surprises. To name an example, InStr(1,"Straße","ss",vbTextCompare)
locates "ß" at position 5. You searched for 2 characters, but
InStr found just one! What is this? The character "ß" stands for
"ss" in the German language. But, if your program was expecting
the 2 characters "ss" or "SS", it can fail now. — Similarly,
InStr(1,"Þingvellir","th",vbTextCompare) locates
"Þ". This happens because the character "Þ" stands for "th" in
Icelandic. Again, you searched for 2 characters, but InStr found
just one. These are not the only examples. You get the same
behavior for æ and œ. Don't use vbTextCompare
unless this is what you really want.
There is a separate article on the InStr function with
detailed information about text comparison.
Tip. Quick uses for InStr
| Locate a substring |
InStr(S, x) |
Locates x in S |
| See if string contains a substring |
InStr(S, x) <> 0 |
Tells us if S contains at least one x |
| See if character is one of alternatives |
InStr("ABC", S) <> 0 |
Tells us if S is one of "A", "B" or "C"
(Len(S) must be 1) |
| See if string is one of alternatives |
InStr("-AB-CD-EF-", "-" & S & "-") <> 0 |
Tells us if S is one of "AB", "CD" or "EF"
(S must not contain "-") |
InStrRev. Also InStrRev has
much better performance with vbBinaryCompare than with
vbTextCompare.
InStrRev is a lot slower than a regular InStr. This
difference is especially big with the vbBinaryCompare option.
Searching for the middle character in our 21-character short
input string, InStrRev spent 5 times the amount InStr spent.
InStrRev's performance also drops considerably when the input
string is long, that is, when there is a lot to search in. It
seems that InStrRev has a bad implementation.
Tip. Avoid InStrRev because of the bad performance. It may
make sense to replace calls to InStrRev with InStr at times.
Example: If the input is known to contain two TABs and you wish
to find the latter one, it may be faster to call InStr twice
rather than InStrRev once.
Replace is a slow function. As with the
other text functions, vbBinaryCompare has better performance
than vbTextCompare.
If a replace is unlikely to occur, you can add performance by
not calling Replace unnecessarily. First test with InStr whether
a replace is required, and then only call Replace if a
replacement is about to happen. Thus, if you need to replace "£"
with "€", first test with InStr if there really is a "£" in the
input string. Only call Replace if there was.
Empty string vs. null string
As you probably know having read
this far, there are two ways to represent a zero-length string:
- the null string:
vbNullString
- the empty string:
""
As far as pure VB6 programming is concerned, they work quite
exactly the same way. The only differences are that
vbNullString is faster and it saves 6 bytes of memory.
What exactly makes vbNullString different from
""? It's the way they are stored. In the table
below, you can see how the empty string uses 6 overhead bytes,
but no data bytes. The Datastring field between the Length
prefix and Terminator fields is hidden, because its length is
zero.
Memory layout of "", the empty string
| Field |
Length prefix |
Terminator |
| Byte |
|
|
|
|
|
|
| Number of bytes |
4 |
2 |
| Value |
0 |
NULL |
What is the memory layout of vbNullString? The
answer: Nothing! There is nothing to store, because
vbNullString is simply a NULL pointer. Besides a zero
pointer, there is nothing else to store. For the empty string,
there is a non-zero pointer and a real 6-byte string allocated
in RAM.
Now let's see what happens when you store a null string and
an empty string in variable S. The null string is
stored as a null pointer. The only thing to store in S
is a zero. To store the empty string "", VB needs
to do more. VB will allocate an empty string in the memory and
set S to point to it. Thus, we consume both the
pointer and the data area. You can see the difference in the
table below.
Null string vs. empty string
| |
Pointer |
|
Data |
StrPtr(S) |
VarPtr(S) |
| S = vbNullString |
0 |
|
(no data) |
0 |
1308564 |
| S = "" |
1568888 |
-> |
6 bytes |
1568888 |
1308564 |
| S = "a" |
1568888 |
-> |
8 bytes |
1568888 |
1308564 |
| Memory address |
1308564 |
|
1568888 |
|
|
In the table, variable S is stored as a pointer at
memory address 1308564.
- When you let
S = vbNullString, the pointer
value is set to zero. Nothing else is stored.
- When you let
S = "", the value 1568888 is
stored at address 1308564. Now S points to the
data area at 1568888. As it happens, this area contains 6
overhead bytes for the empty string "".
- Compare the empty string to a regular non-empty string.
When you let
S = "a", it is stored exactly the
same way as "". Again, S points to the data area
at 1568888. In this area you can find the actual string "a",
which takes 8 bytes.
There is a difference in using "" vs.
vbNullString. The difference is with API calls. Some API calls
may accept only the other. Check out the API documentation
before switching to vbNullString. VB itself doesn't make this
difference.
Getting string pointers
VB6 supports two functions for getting pointers to strings.
VarPtr(S) returns a pointer to the
variable S. That memory location contains a pointer (BSTR)
to the actual string data. This means VarPtr essentially returns
a pointer to a pointer.
StrPtr(S) returns a pointer to the
actual string data currently stored in S. This is
what you need when passing the string to Unicode API calls. The
pointer you get points to the Datastring field, not the Length
prefix field. In COM terminology, StrPtr returns
the value of the BSTR pointer.
API calls with Unicode strings
A lot of Windows API procedures come in two versions: ANSI
and Unicode. The Ansi versions end in 'A' while the Unicode
versions end in 'W'.
VB6 supports the ANSI 'A' versions. When calling a
Declare Sub or Declare Function, VB
automatically converts String parameters from Unicode to ANSI.
This means you safely use "As String" parameters in your Declare
statements as long as you work with ANSI Declares.
Declare Sub MySub Lib "x" Alias "MySubA" (Text As String)
Unfortunately VB6 doesn't directly support the Unicode 'W'
Declares. You can pass a Unicode string quite easily, though.
This is how you declare the Unicode version:
Declare Sub MySub Lib "x" Alias "MySubW" (ByVal Text As Long)
This is how you pass the Unicode string:
MySub StrPtr(Text)
You simply wrap the String parameters in StrPtr. That's how
simple it really is! The API procedure gets a pointer to a
Unicode string, not a copy of the string. By passing your
strings as a pointer, you avoid the costly automated conversion
to ANSI. What is more, your code is not restricted to the
current ANSI character set, but can use the full range of
Unicode characters. This is important with truly international
applications.
Building huge strings
VB6 lacks a StringBuilder class for the creation of large
strings. Consider building a big string in a loop. An example is
when you load a text file line by line:
Do Until EOF(FileNr)
Line Input #FileNr, Line$
Big$ = Big$ & Line$ & vbNewline
Loop
You repeatedly copy stuff to the end of the string with the
concatenation operator &. The bad news is, VB is
not optimized for this. As you grow the string, VB repeatedly
copies Big$ over and over again. This really degrades
performance when repeated. VB constantly allocates new space and
performs a copy.
The problem gets worse as the string exceeds 64K in size.
Small strings are stored in a 64K string cache. As the string
becomes larger than the cache, performance drops considerably.
Order of concatenation
Big$ = Big$ & "abc" & "def"
Big$ = Big$ & ("abc" & "def")
On line 1 above, VB will first join Big$ and "abc". After
this, it will perform another copy to add "def" to the
end.
If Big$ is big, the same is better rewritten with parentheses
around the shorter strings. The trick here is that you avoid
copying the contents of Big$ twice. On line 2, VB will combine "abc"
& "def" first before making a copy of Big$.
Avoiding concatenation
You can avoid the slow run-time concatenation by using
constants:
Const TWO_NEWLINES = vbNewline & vbNewline
String constants are stored in the executable in their
entirety. There will be no concatenation when your program
executes.
Sometimes you cannot use a constant. You can still initialize
a variable when your program starts and use the value where
required:
Public EscapeSequence As String
EscapeSequence = ChrW$(27) & vbCr
Building large strings with CString
A good approach to building large strings is provided in a
class called CString, originally published in Francesco Balena's
article in Visual Basic Programmer's Journal. Designed as a
"string builder" class, CString is a nice replacement for big VB
strings. It is easy to use and its performance is great.
CString stores the string in a byte array. As you add text to
the string, CString allocates more space from time to time.
Because it allocates far less often than VB, CString performs
much better than a regular String.
The bad news is, CString works in ANSI. It stores the string
in a byte array, one byte per character. While this saves
memory, it doesn't work well with all international characters.
Being ANSI means CString will silently convert your characters
to something else. This isn't what you want if your strings are
going to contain any characters outside of the current codepage.
This means, CString doesn't handle international text.
Fortunately, it's possible to make CString work in Unicode.
You need to change the byte array storage to an integer array,
meaning two bytes per character. While this requires twice the
memory, it ensures CString will work with all possible
characters and not cause any nasty effects.
Building a huge string in a file
As your strings grow really, really huge, CString will not be
enough. Memory allocation will eventually fail, even if there is
free RAM available. The limit varies but there is a point where
you get Error #7: Out of memory and your application is
likely to crash. Maybe it's 50 MB or 300 MB, but Out of
memory can hit your application as well. This can happen if
your program produces large reports, logs, web pages, XML or
other textual data files by building the data in a String before
writing it to a text file. What can you do to keep your program
running?
Save directly to a text file. Stop keeping
the string in RAM. Write your string data to a file instead. If
you are creating a report or an HTML page, for example, it's
better to write the text directly to a file rather than building
the text in a String and writing to disk afterwards.
Temporary text file. Writing text directly
to a file isn't always an option, especially when existing code
should be rewritten and you want to avoid a rewrite like the
plague. This is where you can build a helper class. Let's call
the class FileString. The class will use CString (or any other
similar solution) as the default storage, the regular RAM
string. As you append text to FileString, the class will
store the text in the RAM string. If the string grows larger
than a certain limit (say 1 MB), the FileString class creates a
temporary file and dumps the RAM string in it. It then
clears the RAM string to accept more stuff. You go on appending
stuff to the RAM string. Every now and then FileString will
store the RAM string into the temporary file.
Once you're ready, you can have FileString save the resulting
huge string to a file. This is as simple as copying the
temporary file into the final file. If there is any text left in
the RAM string, that text is simply appended to the end of the
final file.
This way you can build virtually unlimited strings, even
exceeding the amount of available RAM.
The FileString solution is especially nice when you can't
tell if the string will be small or large. When the string is
small, it will be built in RAM. When the string is large, it
will go onto the disk. There is no limit to the string data. You
can be certain your code will run as long as there's free space
on the disk.