Background

Take a look at some test code we have written for our project. The code creates a queue and array. It then adds an array to the queue. Both the data structures are created on the heap. At the end of the method, the queue is destroyed and freed, but the array is not freed. Consequently, the following code would give you a memory error.

//FROM FILE MAIN.C
#include <stdlib.h>
#include <stdio.h>
#include "creator_methods.h"
void* malloc_method();
int main(int argc, char** argv) {
   queue* q1 = (queue*) make_queue_queueMod(10);
   void ** a1 = make_array_arrayMod(5); 
   add_queue_queueMod(q1,a1); //Note that as per implementation, you cannot have spaces between argument 
   destroy_queue_queueMod(q1);
   //make_array_arrayMod allocates memory on the heap, but was not destroyed --> memory error
}

Running Valgrind on this code would give you the exact line of the memory leak. Specifically, it will tell you that the memory leak is on this line:

//FROM FILE MAIN.C
void ** a1 = make_array_arrayMod(5); 

However, sometimes even with this information, in more complex programs, it can still be a little difficult to trace the memory leak. Our goal with this project was to create a visualization of the leak to help programmers more easily identify the source of the leak.

Data Structure Identification in C:

The first step in doing this was to create a visualization of the leaky code, displaying all the data structures that are created. For a baseline implementation, we wanted to display any queues, stacks, vectors, or arrays that the user created.

If we were to do this in a relatively high-level language, such as C++, creating the visualizations would be very simple -- we could parse the leaky code and use the Abstract Syntax Tree (AST) to categorize keywords in the user’s code into various data structures, which we could then display.

However, this proved to be challenging in C, since C has very few standardized data structures. For example, unlike other languages that have standardized classes for the “queue” data structure, in C, users must implement their own queue.

When determining how to categorize user-defined data structures in C into generic data structures -- namely queue, stack, vector, and array -- we thought of several different approaches.

The first would be to analyze the user’s struct’s and try to classify them by matching their traits to a list of the generic data structures. We researched about intelligent data matching and looked at several machine learning algorithms that could be used to accomplish this task. Although we came by some interesting findings, we decided to stick to a simpler approach, prioritizing creating a base-line implementation of our project before implementing more advanced features.

We decided on allowing the user to specify the data structures as opposed to writing code to perform identification.

//FROM FILE MAIN.C
while ((option = getopt(argc, argv, "q:s:v:a:n:")) != -1) {
       switch (option) {
           case 'q':
               queue_names[q_idx] = optarg;
               q_idx++;
               break;
           case 's':
              stack_names[s_idx] = optarg;
               s_idx++;
               break;
           case 'v':
               vector_names[v_idx] = optarg;
               v_idx++;
               break;
           case 'a':
              array_names[a_idx] = optarg;
               a_idx++;
               break;
           case 'n':
               break;
       }
   }

In the future, I hope to make the identification more robust by going back and implementing an intelligent data matching algorithm.

Data Structure Interactions and Interface:

The next challenge was to determine the relationship between various data structures. For example, suppose that a user created a queue of integers. Then, the user added an array of integers to the first index of the queue. If we were to look at the code in person, we would easily be able to identify the relationship between the queue and the array. However, how would we identify this relationship by parsing the user’s code?

To better the interaction between user-defined data-structures, we decided to create an “interface” for the user to implement. This way, when parsing the user’s code, we could understand the implementation of their methods. I say “interface” in quotations because we ended up creating an informal interface. The concept of interfaces is quite abstract in C, so we defined an artificial interface by specifying the following guidelines in the creator_methods.h file:

//FROM FILE creator_methods.h

/*Users must standardize methods for their data structures. This is done by calling methods to user-defined data structures. Standardized methods must be named in the following format:

methodType_dataStructureType_constructingStruct

methodType:
"make" - creates the data structure on the *heap*
"destroy" - removes the data structure on the *heap*
"add" - adds an element to the data structure
"remove" - removes an element from the data structure

dataStructureType:
The following data structures are currently supported for visualization: "queue", "stack", "vector", "array".
This list may be appended in the future to expand functionality.

constructingStruct:
This refers to the specific name of the datastructure. For example, if a user creates a queue and names the struct
as "queueMod", constructingStruct will be "queueMod". This is necessary for cases in which the user defines more than
one implementation of a specific data structure.
*/

As specified in the instructions above, a user must define a method in the following format for any struct that they define that is to emulate a generic data structure.

Here is an example of an implementation of the defined “interface” for a user-defined queue specified in the queue_mod file:

//FROM FILE creator_methods.c

//make methods
void * make_queue_queueMod(size_t maxSize) {
   queue * q = queue_create(maxSize);
   return (void*)q;
}
void * make_stack_stackMod(size_t maxSize) {
   (void) maxSize;
   struct stack* s = create_stack();
   return (void*)s;
}
void * make_vector_vectorMod(size_t maxSize) {
   vector_t * v = NULL;
   vector_init(v);
   return (void*)v;
}
void * make_array_arrayMod(size_t maxSize) {
   void * a = malloc(maxSize);
   return (void*)a;
}
void * make_array_dictionaryMod(size_t maxSize) {
   dictionary_t * d = NULL;
   dictionary_init(d);
   return (void*)d;
}

Visualization Process and Results:

After categorizing user datastructures as generic datastructures and creating an interface to interact with user-defined data structures, we were able to draw the relationship between various data structures. We first, parsed the user’s code to generate a series of information strings, which we then passed to another file, visualizer.c to generate graphics (via HTML SVG). An example of this process is listed below:

Step 1: Prase User defined code into “informative strings”

#include <stdlib.h>
#include <stdio.h>
#include "creator_methods.h"


void* malloc_method();
int main(int argc, char** argv) {
   queue* q1 = (queue*) make_queue_queueMod(10);
   void ** a1 = make_array_arrayMod(5); 
   add_queue_queueMod(q1,a1); //Note that as per implementation, you cannot have spaces between argument 
   destroy_queue_queueMod(q1);
   //make_array_arrayMod allocates memory on the heap, but was not destroyed --> memory error
}

The following is an information string. It contains information about how the data structures interact with one another, what the names of the various data structures are, and the size of the structures:

[DataStructure1{"queue", "queueMod", "queue1", DataStructure2, 10, false}, DataStructure2{"array", "arrayMod", "a1", NULL, 5, true}]

Step 2: Convert informative strings into visualizations using SVG

The picture below shows how we imagined our visualization would look. However, after actually working with SVG, we realized that this process was more complicated than we first anticipated. As such, our visualizations did not appear to look as appealing as we had planned. Moreover, at this point in time, the visualization is not robust enough to handle all information strings.

Ideal visualization generated from string:

Built With

Share this project:

Updates