Dynamic Memory Allocation in Unix Systems

It is not always possible, at compile time, to know how big to make all of our data structures. When we send an SQL query to the database, it may return twenty million rows, or it may return one.


The mechanism by which we persuade the operating system to give us memory on the fly, is called dynamically allocated memory. This memory is outside of the memory allocated to the process, in an area known as the ‘heap’, and our doorway into it, is a pointer to the first byte, returned by a function called malloc().

When we see code containing calls to malloc(), it may be difficult to see what it all means, because of the way it has been written, so it may be advantageous to assemble this code, piece by piece.

The basic function, takes one argument, the number of bytes of memory required, and returns a pointer to the first byte of this, like this:

char *pointer;

int size = 1000000;

pointer = malloc(size);

Originally, malloc() used to return a pointer to char, since this pointed to one byte, as well as anything could, but this was too simple. These days, malloc() returns a pointer to ‘void’, which is exactly the same as a pointer to char, but the compiler won’t let you use it, without a cast to your favorite data type.

Therefore, if we need a character array, in the midst of our computation, we would need to rewrite the call, to say:

pointer = (char *)malloc(size);

If malloc() fails, it returns a NULL pointer, which we are duty bound to check, so we code it as:

if((pointer = (char *)malloc(size)) == NULL){

printf(“Memory allocation failedn”);

}

Now, it’s starting to look ugly, and can be made downright hideous, by allocating an array of structures:

struct this{

int one;

int two;

int three;

};

struct this *pointer;

int size = 1000000;

if((pointer = (struct this *)

malloc(size * sizeof(struct this))) == NULL){

printf(“Memory allocation

failedn”);

}

Occasionally, it isn’t possible to know in advance, exactly how much memory we need. We may be collecting data from several different sources, to place in one array, and only know how much each source will provide, when we access it.

There is another function, which permits us to alter the amount of memory which we previously allocated with malloc(), called realloc().

The realloc() function takes a pointer to a dynamically allocated block of memory, and a new size value, and returns a new pointer, to the extended memory:

char *pointer;

int newsize = 2000000;

temp = (char *)realloc(pointer, size);

or, to be pedantic,

if((temp = (char *)realloc(pointer, size)) == NULL){

printf(“Memory reallocation failedn”);

}

If we need to use our original pointer, for cosmetic, or aesthetic reasons, to point to the new memory, we simply reassign it:

pointer = temp;

Very brave programmers, who have faith in the order in which operations are performed, can save the cost of a pointer, by recycling the original pointer:

                           if((pointer = (char *)realloc(pointer,

size)) == NULL){

printf(“Memory reallocation

failedn”);

}

Don’t do this because, down this road lies madness, and a few core dumps. All of that was quite easy, really but, occasionally, we need an array of pointers to things which, themselves, are of variable size. For instance, we may be rifling the bank’s database, looking for the loan payment records of all of its hapless customers. We don’t know, in advance, how many customers there will be, or how many payments they made. We start with the declaration of the two dimensional pointer:

char **pointer;

Some programmers declare this kind of pointer as ‘char *pointer[]’, since this looks like a pointer to an array, but it may be more intuitive to think of this as a pointer to a pointer.

Our first task, is to make the pointer to a pointer point to more than one pointer, In other words, we need an array of pointers, of the correct length. At the moment, all we have, is eight bytes of memory, containing garbage. Those eight bytes need to contain the first address, of an array of addresses. We do this with malloc():

Linked Lists

When we are collecting data, the obvious, and simplest way of doing so, is to declare a structure, then declare a pointer to its type, and malloc an instance. As we acquire more data, we simply realloc our array of structures, and tack the data on to the end.

For getting rows of data out of a database cursor, this is great, and you shouldn’t consider any other approach. However, what happens if you want to remove the 154th data element from the array? Or, perhaps, insert the 154th element?

What if, you are storing data from several sources, like the roads on a map, which you need to attach to specific elements of your array, like the road junctions?

Not so simple.

Despite the mental picture conjured up by the word ‘list’, a linked list can be one dimensional, two dimensional or multi-dimensional. Apart from the street map mentioned above, another well-known application is an electronic circuit diagram, where there are components, connected by wires which, together, form a two-dimensional figure. Add to that, airline routes, railway systems, and the dynamically changing positions of pieces on a chessboard, and you get an idea of the usefulness of linked lists.

The Unix file system uses a linked list to map the blocks allocated to all of the files on a disk. As files are added, deleted, increase or decrease in size, the linked list is appropriately manipulated to reflect the current position.

Okay, so what, exactly, is a linked list?

One of my lecturers described linked lists as ‘a hundred blind men, holding hands in the dark’.

To stretch the analogy a little further, we can add that two of the men have a little red light attached to their heads, so you can see them.

Basically, a linked list is a series of data structures, with a special data structure at the head, and another special data structure at the tail of the list.

Let’s begin with a definition of the data structure.

struct queue {

struct queue *fwd;

struct queue *rev;

char data[1024];

};

Ignoring the embedded data array, notice that there are two pointers, each to a type ‘struct queue’ within the data structure. One is a forward pointer (*fwd), and the other, a reverse pointer (*rev).

It is these pointers, which link the linked list. Since we are using a forward and a reverse pointer, this will be a doubly linked list, but for some applications, we can omit either pointer, and just create a singly linked list.

We’ll only consider the doubly linked list, as the amount of extra effort to do so is minimal.

First, we need to define the special structures for the head and tail.

struct queue *head;

struct queue *tail;

Since these are currently pointers to nothing, let’s initialize them to some real memory:

if((head = (struct queue *)malloc(sizeof(struct queue)))

== NULL){

printf(“Can’t allocate memory for headn”);

return(-1);

}

if((tail = (struct queue *)malloc(sizeof(struct queue)))

== NULL){

printf(“Can’t allocate memory for tailn”);

return(-1);

}

Now we have two blind men with lights on their heads, so we can see them, but they still can’t see each other.

Let’s fix that. We take the fwd pointer of the head, and attach it to the tail, and the rev pointer of the tail, and attach it to the head.

head->fwd = tail;

tail->rev = head;

To identify the head and tail, we need to set the rev pointer of the head to NULL, and to do the same with the

fwd pointer of the tail.

head->rev = NULL;

tail->fwd = NULL;

Now the two blind men have placed their free hand onto a wall, which gives a clue as to how we know we’ve reached either end, when we’re searching the list.

Now, we have to add an element to our list, which we can do at the head or at the tail. This is usually done within a subroutine, imaginatively called add_elmnt() or something, since we don’t want to repeat the code a few hundred times in our program.

First, we create an element

struct queue *elmnt;

if((elmnt = (struct queue *)malloc(sizeof(struct queue)))

== NULL){

printf(“Can’t allocate memory for elmntn”);

return(-1);

}

Then, to add this at the head, we do the following, in the following order. Changing the order may lead to attempts to attach to undefined pointers:

• We first take our rev pointer, and point it to the head, whose address we know.

elmnt->rev = head;

• Then, we point our fwd pointer to the address pointed to by the head’s fwd pointer.

elmnt->fwd = head->fwd;

• Next, we take the rev pointer of the structure pointed to by the fwd pointer of the head, and point it to ourselves.

elmnt-fwd->rev = elmnt;

• At this point, we are attached to both head and tail, and can safely detach the head’s fwd pointer from the tail, and attach it to ourselves.

head->fwd =elmnt;

Why did we do the acrobatics in the third step? Why not, instead, just say

tail->rev = elmnt; ?

The answer is, that we only know the position of the tail before we add the first element. However, we always know that the rev pointer of the structure following the head points back to the head.

If we’re adding our elements to the end of the list, we follow the same method, except that we only know the address of the tail:

ESG Labs: TrueNAS Technical Report
Download Enterprise Storage Guide Button
iXsystems values privacy for all visitors. Learn more about how we use cookies and how you can control them by reading our Privacy Policy.