Journey into the Machine: An Optimistic Killer

3099 VIEWS

This post is part of a content series coming out of a partnership between Sweetcode.io and Holberton School.

A Killer

It happens now and then. I start very confident in my code, and then:

\$./a.out
Killed

What happened exactly?

I started looking into it as I was testing a program. As a student at Holberton School, my teachers are pretty adamant about writing good code. For the beginner that I am, that means accounting for all kind of edge cases without failing. Since we are not given those cases, the hunt is then on to find where my code can fail and fix it before submitting it.

As it happens, at one point I learned about malloc, and with it was its coronary:

something = malloc( ... );
if (something == NULL)
return (NULL);

And there was an exercise we had to do: Write a function that returns a pointer to a newly created two-dimensional integer grid whose members have all been set to 0 . And I was missing a case. I thought: “I never really test this NULL condition—Maybe there is something wrong there?”

I set up to trigger this if statement. I created a main function that would make bigger and bigger grids, until it would be too big for memory and I would see a NULL, or so I thought. Here is the code:

int main(void)
{
int row;
int **g;
row = 1;
while (row < 50000)
{
printf("grid[%i][%i]\n", row, row);
g = alloc_grid(row, row);
if (g == NULL)
{
printf("NULL clause\n");
return (0);
}
row++;
}
return (0);
}

I compiled and ran my code.

\$./a.out
Killed
\$\$?
137: command not found

\$? returns the exit status of the last command. Here it returns the exit status of the process that was launched when I ran my code. What is 137? It is nowhere in my code. Looking at the documentation, 137 corresponds to 128 + n, meaning fatal error signal n. In my case, n = 9. It corresponds to a signal otherwise named SIGKILL or kill signal. This signal causes the immediate termination of the process by the Linux kernel.

What happened? With this code, I was using more and more memory, and never stopped. At one point I reached the Out Of Memory condition. To deal with it, linux has an Out of Memory Manager (OOM), and its weapon, the OOM Killer. When I am running processes taking too much memory, the OOM makes its counts, and chooses the “best” choice to kill. In my case, it was obvious. It was me making bigger and bigger grids, and never freeing anything.

An Optimist

I was testing my code and things were not turning out as I wanted. I thought maybe it was due to the way I’d created my grid. Basically, there are two ways to go about it. They’re illustrated below:

Create an array of pointers to hold rows and create each row in a loop.

Create a unique space in memory, and then code a way to use [][] notation.

I’ll call them alloc_rows and alloc_singlespace. I tested my code for both, printing the size of the grid I was trying to create. This is where it stopped in the first and second case.

grid[625][625]
Killed
\$

grid[633][633]
Killed
\$

The process is killed earlier when I allocate rows separately, which makes sense as it uses more memory.

Then I tried to do the same thing without initializing the grid to 0. I try to build a grid with up to 49,999 rows and columns.

grid[618][618]
Killed
\$
grid[6418][6418]
NULL clause
\$

The discrepancy in results here is due to the Opportunistic Memory Allocation Strategy. It is explained succinctly in the malloc man page, which says:

This means that when malloc() returns non-NULL there is no guarantee that the memory really is available. In case it turns out that the system is out of memory, one or more processes will be killed by the OOM killer.

In my case with the first strategy, to explain the process, first I use malloc to create an array of pointers, and then I access that memory when I initialize all those pointers. So at one point, the process gets killed. It works the same way as when I initialized all the values to 0. However, in the second case, I do not ever try to access the memory allocated by malloc, and I can finally trigger my NULL condition.

So, my Linux is like this crooked trader. “You want memory? I have memory. Loads of it, hardly ever runs out!” Then, after it kills my process, it responds with, “You never said you wanted to use it.” There’s another way to look at it—“There are so many bad coders that this is a safety measure against malloc enthusiasts.”

I am learning to code in C, and I find it so interesting to see how intertwined the OS and C are. I tend to always looked down on the hardware as a limitation to my creativity. But now, as I see where and get a little bit of the why it does not work, I marvel at the complexity and overall efficiency of the work the OS is doing.

Note: I obtained those results running in Ubuntu 14.04.5 LTS inside Vagrant 1.8.6 . I used gcc (Ubuntu 5.4.1–2ubuntu1~14.04) 5.4.1 20160904 to compile. Those results are machine-dependent, so if you try this by yourself, you might get something different depending on your OS, its configuration, and the version of malloc you use. Something to learn more about...

Anne comes from an art background where she was specialized in Persian illuminations. She is now a student at Holberton School where she chose to specialize in low-level programming, Linux and algorithms. Feel free to contact her on her twitter @1million40.

Discussion

Click on a tab to select how you'd like to leave your comment
Skip to toolbar