C++ Typecasts – An Assembly POV

This post may be considered a continuation of the post Addressing The Addresses or a separate read. We are basically interested in understanding how type-casts are done at assembly level, corresponding to C++ language, which may give a clue about the address arithmetic being performed leading to the offsets discussed in the post I mentioned in the beginning.

First let me demonstrate the typecasting of primitive types (I am taking some stuff from this wiki page). Consider the C++ code

int aVar = 65;
int* intPointer = &aVar;
    
char* charPointer = (char*) intPointer;

In the last line we are doing an explicit type conversion from int to char.

From assembly instruction perspective, there is no difference between int and char as far as the casting is concerned. Information about one pointer (of type int) is shared with the different pointer (of type char). Only the dereferencing bit differs like so

char b = *aP;         mov     rax, QWORD PTR [rbp-8]
                      movzx   eax, BYTE PTR [rax]
                      mov     BYTE PTR [rbp-9], al

while for int

int d = *cP;          mov     rax, QWORD PTR [rbp-24]
                      mov     eax, DWORD PTR [rax]
                      mov     DWORD PTR [rbp-28], eax

for char type of dereferencing, BYTE is read, while, for int type of dereferencing, DWORD is read.

Moving on, we now consider user defined data types, specially classes. Consider the following code

#include <iostream>
class UObjectBase
{
    int a;
    int b;
};

class UObject : public UObjectBase
{
public:
    virtual ~UObject() = default;
};

// Type your code here, or load an example.
int main()
{
    UObjectBase bar;
    UObjectBase* ptr = &bar;

    UObject* fp = (UObject*) ptr;

    std::cout << &bar << '\n';
    std::cout << fp << '\n';
}

The out put of above program may be

0x7ffe4dd2f998
0x7ffe4dd2f990 // 8 bytes offset

however on removing the virtual function (the destructor ~UObject()) may lead to the out put

0x7ffe4dd2f998
0x7ffe4dd2f998 // same address

Here, we are basically down casting from UObjectBase to UObject and that leads to the offset of 8 bytes in presence of virtual function (vtable). The assembly code instructions (generated vis godbolt) look like

    UObject* fp = (UObject*) ptr;    cmp     QWORD PTR [rbp-8], 0
                                     mov     eax, 0
                                     mov     rax, QWORD PTR [rbp-8]
                                     sub     rax, 8
                                     jmp     .L3

where .L3 is some complex set of instructions, while, in absence of the virtual function, instructions are like so

    UObject* fp = (UObject*) ptr;    mov     rax, QWORD PTR [rbp-8]
                                     mov     QWORD PTR [rbp-16], rax

In presence of the virtual function (or vtable) there is an instruction to subtract 8 bits from rax which creates the offset in addresses shown in the out put. This example was taken from stackoverflow post.

In the book “Effective C++, third edition”, item 27, Scott Myers points that

… a single object (e.g., an object of
type Derived) might have more than one address (e.g., its address
when pointed to by a Base* pointer and its address when pointed to by
a Derived* pointer). That can’t happen in C. It can’t happen in Java. It
can’t happen in C#. It does happen in C++.

Participate in discussion