C++ Assignment Operator or: How I Learned to Stop Worrying and not Check for Self-Assignment

If you’re already here, I’m going to assume you know a lot of the intricacies of writing an assignment operator in C++. Getting all the ins and outs of writing assignment operators is tricky – if you want a good discussion of all the bits and pieces, I’d recommend taking a look at The Anatomy of the Assignment Operator. But that’s not why I’m here, so at the risk of being blasphemous, I’ll just get my main point out of the way.

If you are writing
if (this != &rhs)
in your assignment operator, you are probably doing it wrong, or at the very least, not as right as it should be.

That’s it. Take a deep breath, sit back, relax and just let it settle. I’ve been told to do it this way for years, and you probably have too, so it might be a bit of a shock. But please, please stop telling people to write it this way. It’s important to understand why people do it, but it’s just as important to understand why it’s wrong.

Self-assignment

Let’s look at why people recommend writing this check. Imagine we have the following class:

class Foo
{
public:

    ...

private:
    Bar* myBar;
};

A first attempt at an assignment operator might look like the following:

Foo& operator=(const Foo& rhs)
{
    delete myBar;
    myBar = new Bar(*rhs.myBar);
    return *this;
}

The first thing that we do is delete the old copy of myBar so that no memory is leaked. Then, we make a copy of the right hand side’s myBar. This is where our problems start. What would happen in the above assignment operator if I tried to assign an object to itself? For example, if I wrote the following:

Foo f;
f = f;

In this case, the parameter rhs would be the same as the current object. So when we try to run the line 3 in the assignment operator:

delete myBar;

the rhs’s myBar is also being deleted. So what happens when we try to run the next line?

myBar = new Bar(*rhs.myBar);

We’re trying to access and copy memory that has already been deleted. At this point, any number of things could happen. Some of the time, that memory will still belong to our process and will be unchanged, and our program will happily continue. But once in a while, the program will mysteriously crash. Trying to debug this issue when it works 99 out of 100 times is difficult and irritating.

So what can we do instead? The commonly suggested solution is to check for self-assignment and skip the allocation and deletion in that case. This does solve the problem, but let’s take a step back and look at our code again. The problem with self-assignment in the above constructor is that by the time we are ready to copy the Bar object, we’ve already deleted it. But what if we did the copy first, swapped that copy with our member variable, and then deleted what used to be our member variable?

Here’s a new code snippet that does that instead:

Foo& operator=(const Foo& rhs)
{
    Bar* temp = new Bar(*rhs.myBar);
    swap(temp, myBar);
    delete myBar;
    return *this;
}

What happens now if we accidentally do self-assignment? Well, line 3 makes a copy of the member variable. Line 4 swaps the copy with our member variable. Finally, line 5 deletes what used to be our member copy. No memory is leaked, no deleted memory is accessed, and at worst, some cycles are wasted making a copy of some memory and deleting it. We fixed the problem without ever needing a check for self-assignment!

But just avoiding a special case doesn’t necessarily make this version better, does it? Well…

Exception guarantees

In C++, there are several guidelines you should try to follow whenever you’re writing a function. These are known as exception guarantees. In some cases, it is impossible to implement the stronger ones, but you should try to implement them if you can.

The first guarantee is the basic guarantee which states that your code should not leak memory if any exceptions are thrown. In both our copy constructors, the only line that could throw an exception is the memory allocation. If it does, then there is no memory cleanup required because we’re only trying to allocate a single object. If our class did have more objects, we would need to allocate them sequentially, catch any exceptions that might be thrown while making copies of them and delete the copies that we had made so far before rethrowing the exception.

The second guarantee is the strong guarantee which states that your code should leave the object in a consistent state if any exceptions are thrown. Let’s imagine that I was using the Foo class in the following context:

do
{
    try
    {
        Foo f1, f2;

        ...

        f1 = f2;

        ...

    }
    catch (bad_alloc& e)
    {
        // Clean up some memory and keep running
    }
} while (!done);

Here, we have a long running process and that may run out of memory at some point, but we have a way of cleaning up some memory if an error ever does occur, so we’d like to continue running the process until it successfully completes.

Now imagine what happens with our first assignment operator. We first delete the member variable myBar, then try to make a copy of the right hand side’s myBar. If this throws an exception, then our Foo will be left with myBar pointing at some invalid memory. When our program tries to pick up where it left off, it will try to access that memory and will probably crash.

What about with our second assignment operator? We first try making a copy of the right hand side’s myBar. If it throws an exception, our Foo is left unchanged. The program can run the loop again and nothing crashes.

By doing any allocations before deleting objects, we’ve managed to avoid the problems of self-assignment without writing if (this != &rhs) . In addition, this code implements the strong guarantee and is therefore safer. If your code relies on a check for self-assignment for correctness, then it probably doesn’t implement the strong guarantee and is less safe than it should be.

But what about speed?

One question I’ve received when explaining this concept is that the second version is much slower when doing self-assignment than the original version. That is true – comparing two memory addresses is much less costly than allocating and deleting memory. However, in practice, self-assignment is generally rare. Optimizing one specific rare case at the expense of the much more common general cases will likely not gain you much. Putting in a special case to guarantee that your program does not crash 1 out of 1000 times is important; putting in a special case to optimize 1 out of 1000 cases is not. If you are concerned about speed, profile your code first and determine how often self-assignment really occurs.

Alternatives

It turns out that there is an even easier way to write an assignment operator as long as certain conditions are met. The strongest exception guarantee is the no-throw guarantee which states that no exceptions will be thrown. As long as we’re not allocating any memory or other objects, this should be possible. In Foo, the only member variable we have is a pointer. Because moving pointers will not throw an exception, we can write a member swap() method as part of the Foo class that swaps the contents of two Foo objects that has the no-throw guarantee. Then, we can write the whole copy constructor as follows:

Foo& operator=(Foo rhs)
{
    swap(rhs); // Calls Foo::swap
    return *this;
}

Rather than sending the right hand side object by reference, we send it by value; this causes the compiler to make a temporary copy of the the right hand side object using Foo’s copy constructor. This temporary object will be sent to the method, and the copy will be swapped with the local object. Then, upon exiting the method, the temporary object will go out of scope and be automatically destroyed. And so, without doing any extra work, we automatically get the basic and strong guarantees.

Advertisements
This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

5 Responses to C++ Assignment Operator or: How I Learned to Stop Worrying and not Check for Self-Assignment

  1. Per says:

    Where’s the single-argument swap-method defined?

    • bskari says:

      The single argument swap method should be defined in your class. Most of the STL containers have a swap method defined in them that takes constant time, so if your class has STL containers members, it should just call .swap(rhs.container) on each of them in turn.

      The two argument std::swap method probably won’t work, because in C++98, if a class hasn’t defined a specialization of std::swap, then it will fall back to calling the assignment operator, which will cause an infinite loop and stack overflow. Most specializations of std::swap that I’ve seen just call the single-argument member swap method anyway, so doing so explicitly should hopefully prevent any problems,

      I’ll update the post so this is more clear, thanks!

  2. Anonymous says:

    Uh, don’t you mean “delete temp;” instead of “delete myBar;”?

  3. anon says:

    in the section Alternatives, first para, last sentence – “Then we can write the whole copy constructor as follows:”

    What follows is not a copy constructor, it’s the assignment operator, isn’t it? The first sentence in the paragraph also singles out the assignment op as the focus of the paragraph. Not trying to pick nits, but this whole topic is dense and muddled enough. For people trying to build up insight into it, or newbies, don’t want to make it any harder than it needs to be.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s