summary refs log tree commit diff
path: root/doc/strings.md
diff options
context:
space:
mode:
authorWlodekM <[email protected]>2024-06-16 10:35:45 +0300
committerWlodekM <[email protected]>2024-06-16 10:35:45 +0300
commitabef6da56913f1c55528103e60a50451a39628b1 (patch)
treeb3c8092471ecbb73e568cd0d336efa0e7871ee8d /doc/strings.md
initial commit
Diffstat (limited to 'doc/strings.md')
-rw-r--r--doc/strings.md329
1 files changed, 329 insertions, 0 deletions
diff --git a/doc/strings.md b/doc/strings.md
new file mode 100644
index 0000000..9fa905a
--- /dev/null
+++ b/doc/strings.md
@@ -0,0 +1,329 @@
+## Introduction
+
+ClassiCube uses a custom string type rather than the standard C `char*` string in most places
+
+ClassiCube strings (`cc_string`) are a struct with the following fields:
+- `buffer` -> Pointer to 8 bit characters (unsigned [code page 437 indices](https://en.wikipedia.org/wiki/Code_page_437#Character_set))
+- `length` -> Number of characters currently used
+- `capacity` -> Maximum number of characters (i.e buffer size)
+
+Note: This means **STRINGS MAY NOT BE NULL TERMINATED** (and are not in most cases)
+
+You should also read the **Strings** section in the [style guide](/doc/style.md)
+
+## Memory management
+Some general guidelines to keep in mind when it comes to `cc_string` strings:
+- String buffers can be allocated on either the stack or heap<br>
+(i.e. make sure you don't return strings that are using stack allocated buffers)
+- Strings are fixed capacity (strings do not grow when length reaches capcity)<br>
+(i.e. make sure you allocate a large enough buffer upfront)
+- Strings are not garbage collected or reference counted<br>
+(i.e. you are responsible for managing the lifetime of strings)
+
+## Usage examples
+
+Initialisating a string from readonly text:
+```C
+cc_string str = String_FromConst("ABC");
+```
+
+Initialising a string from temporary memory on the stack:
+```C
+// str will be able to store at most 200 characters in it
+char strBuffer[200];
+cc_string str = String_FromArray(strBuffer);
+```
+
+Initialising a string from persistent memory on the heap:
+```C
+// str will be able to store at most 200 characters in it
+char* str = Mem_Alloc(1, 200, "String buffer");
+cc_string str = String_Init(str, 0, 200);
+```
+
+# Converting to/from other string representations
+
+## C String conversion
+
+### C string -> cc_string
+
+Creating a `cc_string` string from a C string is straightforward:
+
+#### From a constant C string
+```C
+void Example(void) {
+    cc_string str = String_FromConst("test");
+}
+```
+
+#### From a C string
+```C
+void Example(const char* c_str) {
+    cc_string str = String_FromReadonly(c_str);
+}
+```
+Note: `String_FromReadonly` can also be used with constant C strings, it's just a bit slower
+
+#### From a C fixed size string
+```C
+struct Something { int value; char name[50]; };
+
+void Example(struct Something* some) {
+    cc_string str = String_FromRawArray(some->name);
+}
+```
+
+### cc_string -> C string
+
+The `buffer` field **should not** be treated as a C string, because `cc_string` strings **MAY NOT BE NULL TERMINATED** 
+
+The general way to achieve this is to
+1. Initialise `capacity` with 1 less than actual buffer size (e.g. use `String_InitArray_NT` instead of `String_InitArray`)
+2. Perform various operations on the `cc_string` string
+3. Add null terminator to end (i.e. `buffer[length]` = '\0';
+4. Use `buffer` as a C string now
+
+For example:
+```C
+void PrintInt(int value) {
+    cc_string str; char strBuffer[128];
+    String_InitArray_NT(str, strBuffer);
+    String_AppendInt(&str, value);
+    str.buffer[str.length] = '\0';
+    puts(str.buffer);
+}
+```
+
+## OS String conversion
+
+`cc_string` strings cannot be directly used as arguments for operating system functions and must be converted first.
+
+The following functions are provided to convert `cc_string` strings into operating system specific encoded strings:
+
+### cc_string -> Windows string
+
+`Platform_EncodeString` converts a `cc_string` into a null terminated `WCHAR` and `CHAR` string
+
+#### Example
+```C
+void SetWorkingDir(cc_string* title) {
+    cc_winstring str;
+    Platform_EncodeUtf16(&str, title);
+    SetCurrentDirectoryW(str.uni);
+	
+    // it's recommended that you DON'T use the ansi format whenever possible
+    //SetCurrentDirectoryA(str.ansi); 
+}
+```
+
+### cc_string -> UTF8 string
+
+`String_EncodeUtf8` converts a `cc_string` into a null terminated UTF8-encoded `char*` string
+
+#### Example
+```C
+void SetWorkingDir(cc_string* title) {
+    char buffer[NATIVE_STR_LEN];
+    String_EncodeUtf8(buffer, title);
+    chdir(buffer);
+}
+```
+
+# API
+
+I'm lazy so I will just link to [String.h](/src/String.h)
+
+If you'd rather I provided a more detailed reference here, please let me know.
+
+TODO
+
+# Comparisons to other string implementations
+
+## C comparison
+
+A rough mapping of C string API to ClassiCube's string API:
+```
+atof    -> Convert_ParseFloat
+strtof  -> Convert_ParseFloat
+atoi    -> Convert_ParseInt
+strtoi  -> Convert_ParseInt
+
+strcat  -> String_AppendConst/String_AppendString
+strcpy  -> String_Copy
+strtok  -> String_UNSAFE_Split
+
+strlen  -> str.length
+strcmp  -> String_Equals/String_Compare
+strchr  -> String_IndexOf
+strrchr -> String_LastIndexOf
+strstr  -> String_IndexOfConst
+
+sprintf -> String_Format1/2/3/4
+    %d  -> %i
+  %04d -> %p4
+    %i  -> %i
+    %c  -> %r
+  %.4f  -> %f4
+    %s  -> %s (cc_string)
+    %s  -> %c (char*)
+    %x  -> %h
+```
+
+## C# comparison
+
+A rough mapping of C# string API to ClassiCube's string API:
+```
+byte.Parse   -> Convert_ParseUInt8
+ushort.Parse -> Convert_ParseUInt16
+float.Parse  -> Convert_ParseFloat
+int.Parse    -> Convert_ParseInt
+ulong.Parse  -> Convert_ParseUInt64
+bool.Parse   -> Convert_ParseBool
+
+a += "X";     -> String_AppendString
+b = a;        -> String_Copy
+string.Insert -> String_InsertAt
+string.Remove -> String_DeleteAt
+
+string.Substring -> String_UNSAFE_Substring/String_UNSAFE_SubstringAt
+string.Split     -> String_UNSAFE_Split/String_UNSAFE_SplitBy
+string.TrimStart -> String_UNSAFE_TrimStart
+string.TrimEnd   -> String_UNSAFE_TrimEnd
+
+a.Length           -> str.length
+a == b             -> String_Equals
+string.Equals      -> String_CaslessEquals (StringComparison.OrdinalIgnoreCase)
+string.IndexOf     -> String_IndexOf/String_IndexOfConst
+string.LastIndexOf -> String_LastIndexOf
+string.StartsWith  -> String_CaselessStarts (StringComparison.OrdinalIgnoreCase)
+string.EndsWith    -> String_CaselessEnds   (StringComparison.OrdinalIgnoreCase)
+string.CompareTo   -> String_Compare
+
+string.Format -> String_Format1/2/3/4
+```
+*Note: I modelled cc_string after C# strings, hence the similar function names*
+
+## C++ comparison
+
+A rough mapping of C++ std::string API to ClassiCube's string API:
+```
+std::stof  -> Convert_ParseFloat
+std::stoi  -> Convert_ParseInt
+std::stoul -> Convert_ParseUInt64
+
+string::append -> String_AppendString/String_AppendConst
+b = a;         -> String_Copy
+string::insert -> String_InsertAt
+string::erase  -> String_DeleteAt
+
+string::substr -> String_UNSAFE_Substring/String_UNSAFE_SubstringAt
+
+string::length  -> str.length
+a == b          -> String_Equals
+string::find    -> String_IndexOf/String_IndexOfConst
+string::rfind   -> String_LastIndexOf
+string::compare -> String_Compare
+
+std::sprintf -> String_Format1/2/3/4
+```
+
+# Detailed lifetime examples
+
+Managing the lifetime of strings is important, as not properly managing them can cause issues.
+
+For example, consider the following function:
+```C
+const cc_string* GetString(void);
+
+void PrintSomething(void) {
+	cc_string* str = GetString();
+	// .. other code ..
+	Chat_Add(str);
+}
+```
+
+Without knowing the lifetime of the string returned from `GetString`, using it might either:
+* Work just fine
+* Sometimes work fine
+* Cause a subtle issue
+* Cause a major problem
+ptodo rearrange
+
+### Constant string return example
+```C
+const cc_string* GetString(void) {
+	static cc_string str = String_FromConst("ABC");
+	return &str;
+}
+```
+
+This will work fine - as long as the caller does not modify the returned string at all
+
+### Stack allocated string return example
+
+```C
+const cc_string* GetString(void) {
+	char strBuffer[1024];
+	cc_string str = String_FromArray(strBuffer);
+	
+	String_AppendConst(&str, "ABC");
+	return &str;
+}
+```
+
+This will **almost certainly cause problems** - after `GetString` returns, the contents of both `str` and `strBuffer` may be changed to arbitary values (as once `GetString` returns, their contents are then eligible to be overwritten by other stack allocated variables)
+
+As a general rule, you should **NEVER** return a string allocated on the stack
+
+### Dynamically allocated string return example
+
+```C
+const cc_string* GetString(void) {
+	char* buffer   = Mem_Alloc(1024, 1, "string buffer");
+	cc_string* str = Mem_Alloc(1, sizeof(cc_string), "string"); 
+	
+	*str = String_Init(buffer, 0, 1024);
+	String_AppendConst(str, "ABC");
+	return str;
+}
+```
+
+This will work fine - however, now you also need to remember to `Mem_Free` both the string and its buffer to avoid a memory leak
+
+As a general rule, you should avoid returning a dynamically allocated string
+
+### UNSAFE mutable string return example
+
+```C
+char global_buffer[1024];
+cc_string global_str = String_FromArray(global_buffer);
+
+const cc_string* GetString(void) {
+	return &global_str;
+}
+```
+
+Depending on what functions are called in-between `GetString` and `Chat_Add`, `global_str` or its contents may be modified - which can result in an unexpected value being displayed in chat
+
+This potential issue is not just theoretical - it has actually resulted in several real bugs in ClassiCube itself
+
+As a general rule, for unsafe functions returning a string that may be mutated behind your back, you should try to maintain a reference to the string for as short of time as possible
+
+### Reducing string lifetime issues
+
+In general, for functions that produce strings, you should try to leave the responsibility of managing the string's lifetime up to the calling function to avoid these pitfalls
+
+The example from before could instead be rewritten like so:
+
+```C
+void GetString(cc_string* str);
+
+void PrintSomething(void) {
+	char strBuffer[256];
+	cc_string str = String_InitArray(strBuffer);
+	GetString(&str);
+	
+	// .. other code ..
+	Chat_Add(&str);
+}
+```
\ No newline at end of file