Introduction
I recently had to find a memory leak in a huge C/C++ application running on linux. Unfortunately, I was not able to use valgrind, because the application was too slow when running inside the valgrind VM. So I decided to write my own memory debugging library. The key idea is to overwrite/replace the original implementations of malloc, free, new and delete. There are several ways to achieve that, one of which I will explain in this howto. You can either replace/wrap the original functions by using the --wrap <symbol> linker directive or you can make use of libc’s built in hook mechanisms. We’ll address the latter, since it’s a little more elegant.
Furthermore, it’s possible to either directly integrate/link your own malloc and free implementations into your build system by using appropriate linker options or you can inject them by using the LD_PRELOAD environment variable. The latter basically allows you to let the linker load certain libraries before any other library is being loaded. This will be the place, where the injection will take place. Let’s have a look…
The leak
Assume the following C program leak.c
#include <stdlib.h> #include <stdio.h> #include <stdlib.h> #include <time.h> #include <unistd.h> void leak(int num) { fprintf(stderr, "%s allocating %d bytes...\n", __PRETTY_FUNCTION__, num); malloc(num); } int main(int argc, char** argv) { srand((unsigned)time(NULL)); fprintf(stderr, "%s: PID: %d\n", __PRETTY_FUNCTION__, getpid()); while (1) { leak(rand() % 1024); sleep(rand() % 10); } return 0; } |
As you can see, the program calls the leak() function endlessly and sleeps for a random amount of seconds afterwards. Within the leak() function, there’s a call to malloc() which is leading to a memory leak, because there’s no call to free elsewhere. This is how we simulate a memory leak.
Compile the program as follows
gcc -Wall -g -o leakc leak.c |
Off the hook
Now it’s time to trace where all that memory is going. We’ll use the provided hooks in libc, more precisely __malloc_hook, __free_hook and __malloc_initialize_hook.
Create another C file btm.c with the following content:
#include <stdlib.h> #include <stdio.h> #include <malloc.h> /* Prototypes for our hooks. */ static void my_init_hook(void); static void *my_malloc_hook(size_t, const void*); static void my_free_hook(void*, const void*); /* Variables to save original hooks. */ static void *(*old_malloc_hook)(size_t, const void *); static void (*old_free_hook)(void*, const void *); /* Override initializing hook from the C library. */ void (*__malloc_initialize_hook) (void) = my_init_hook; static void my_init_hook(void) { fprintf(stderr, "%s: setting up hooks...\n", __PRETTY_FUNCTION__); old_malloc_hook = __malloc_hook; old_free_hook = __free_hook; __malloc_hook = my_malloc_hook; __free_hook = my_free_hook; } static void restoreOldHooks() { __malloc_hook = old_malloc_hook; __free_hook = old_free_hook; } static void restoreMyHooks() { __malloc_hook = my_malloc_hook; __free_hook = my_free_hook; } static void saveOldHooks() { old_malloc_hook = __malloc_hook; old_free_hook = __free_hook; } static void* my_malloc_hook(size_t size, const void* caller) { void* res; // Restore all old hooks restoreOldHooks(); // Call recursively res = malloc(size); // Save underlying hooks saveOldHooks(); // Do your memory statistics here... fprintf(stderr, "malloc (%u) returned @%p\n", (unsigned int) size, res); // Restore our own hooks restoreMyHooks(); return res; } static void my_free_hook(void* ptr, const void* caller) { // Restore all old hooks restoreOldHooks(); // Call free() recursively. free(ptr); // Save underlying hooks saveOldHooks(); // Do your memory statistics here... fprintf(stderr, "freed pointer @%p\n", ptr); // Restore our own hooks restoreMyHooks(); } |
Compile with
gcc -Wall -fPIC -c btm.c -o btm.o |
Create a dynamic library with
gcc -shared -o libmt.so btm.o |
As you can see, there are two static function pointers old_malloc_hook and old_free_hook, these will be used to store and restore the hook states. Within the my_init_hook() function, we simply override the predefined hooks with our implementations. Within the hook implementations, you’ll have to call the actual malloc() and/or free() functions. But now, you have the opportunity to capture the pointer that has been returned by malloc(). Within your hook implementations, you can perform whatever debugging will be necessary for you. I ended up implementing some kind of count/statistic mechanism, keeping track of all the pointers being allocated and freed. Each time malloc() is called, the returned pointer will be stored in a collection/array. Each time free() is called, a stored pointer will be removed from the collection/array. As soon as your program has finished, print the pointers left over. I’ve combined this method with the results coming from backtrace(), so I could investigate the stack afterwards.
Putting it together
Now that you have your leaking program leakc and your memory tracing library libmt.so you can start your program as follows:
LD_PRELOAD=./libmt.so ./leakc |
And the linker will use your implemented hooks. Now you’re able to use your tracing library for actual buggy programs.
Adding C++ to the game
The things we’ve addressed above only apply to C, not to C++. The operators new and delete do call malloc and free under the hood, but the injection might not work for libc’s internal calls. But that’s not a big deal, because we can do operator overloading in C++. Assume btm.cpp to be the following
#include "backtraceMalloc.h" #include <cstdlib> #include <cstdio> void* operator new(size_t size) { void* ret = malloc(size); return ret; } void* operator new[](size_t size) { void* ret = malloc(size); return ret; } void operator delete(void* ptr) { free(ptr); } void operator delete[](void* ptr) { free(ptr); } |
Compile with
g++ -fPIC -c bmt.cpp -o bmt_cxx.o |
You can now link the C parts altogether with the C++ parts as follows:
g++ -shared -o libmtrc.so bmt.o bmt_cxx.o |
Note that you’ll have to do whatever tracing functionality you need within that overloaded operators.