C++ 类&虚表的初始化流程

因啃到 java 多态时和 C++ 进行了一个对比,顺便研究了一下 C++ 类和虚表的初始化,虽然都是些很基础的东西,写出来比较丢人 XD,但是还是记录一下(毕竟这么久没更新过,先水一篇),如有错误,还望指正

#1

记录的是比较简单的初始化流程,没有什么复杂操作,用一下代码作为例子进行记录:

//code-1
#include<iostream>

class aaa{
public :
    int num = 10;
    int mmmm = 1;
    void kkk(){
        std::cout << "I am kkk1" << std::endl;
    }

    virtual void ppp(){
        std::cout << "I am aaa_ppp" << std::endl;
    }

    virtual void eee(){
        std::cout << "I am aaa_eee" << std::endl;
    }

    virtual void www(){
        std::cout << "I am aaa_www" << std::endl;
    }
};

class bbb :public aaa{
public :
    int num = 20;
    int mmmm = 2;
    void kkk(){
        std::cout << "I am kkk2" << std::endl;
    }

    virtual void ppp(){
        std::cout << "I am bbb_ppp" << std::endl;
    }

    virtual void eee(){
        std::cout << "I am bbb_eee" << std::endl;
    }

    virtual void zzz(){
        std::cout << "I am bbb_zzz" << std::endl;
    }
};

class ccc :public bbb{
public:
    int num = 30;
    int mmmm = 3;
    void kkk(){
        std::cout << "I am kkk2" << std::endl;
    }

    virtual void ppp(){
        std::cout << "I am ccc_ppp" << std::endl;
    }

    virtual void eee(){
        std::cout << "I am ccc_eee" << std::endl;
    }

    virtual void vvv(){
        std::cout << "I am ccc_vvv" << std::endl;
    }
};

int main()
{
    aaa *a = new ccc();
    a->kkk();
    a->ppp();
    a->eee();
    std::cout << a->num << std::endl;
    return 0;
}

虚表是 C++ 实现多态的常用方式,当类中有虚函数的时候,编译器会给对应类建立一张虚表用于虚函数的索引,虚表随着类的初始化而初始化,而分析流程通过汇编层和伪代码结合是可以清楚很多的,所以选择了 IDA 进行分析,定位到主函数如下

//code-2
int __cdecl main()
{
  aaa *v0; // eax
  int v1; // eax
  std::basic_ostream<char,std::char_traits<char> > *(__cdecl *v3)(std::basic_ostream<char,std::char_traits<char> > *); // [esp-4h] [ebp-ECh]
  aaa *v4; // [esp+Ch] [ebp-DCh]
  ccc *v5; // [esp+14h] [ebp-D4h]
  aaa *a; // [esp+E0h] [ebp-8h]

  v5 = (ccc *)operator new(0x1Cu);
  if ( v5 )
  {
    ccc::ccc(v5);
    v4 = v0;
  }
  else
  {
    v4 = 0;
  }
  a = v4;
  aaa::kkk(v4);
  a->vfptr->ppp(a);
  a->vfptr->eee(a);
  v3 = std::endl<char,std::char_traits<char>>;
  v1 = MSVCP120D_NULL_THUNK_DATA(*(_DWORD *)std::cout.gap0, a->num, std::endl<char,std::char_traits<char>>);
  std::basic_ostream<char,std::char_traits<char>>::operator<<(v1);
  return 0;
}

可以看到在第 11 行,程序为 ccc 类申请了大小为 0x1c 的内存,用于放置类的成员,然后再进入 ccc::ccc() 进行初始化,进入 ccc::ccc()

//code-3
void __thiscall ccc::ccc(ccc *this)
{
  ccc *v1; // STD4_4

  v1 = this;
  bbb::bbb((bbb *)&this->vfptr);
  v1->vfptr = (aaaVtbl *)&ccc::`vftable';
  v1->num = 30;
  v1->mmmm = 3;
}
/*8-10行汇编如下:
.text:00415E5B                 mov     eax, [ebp+this]
.text:00415E5E                 mov     dword ptr [eax], offset ??_7ccc@@6B@ ; const ccc::`vftable'
.text:00415E64                 mov     eax, [ebp+this]
.text:00415E67                 mov     dword ptr [eax+14h], 1Eh
.text:00415E6E                 mov     eax, [ebp+this]
.text:00415E71                 mov     dword ptr [eax+18h], 3
*/

看到在 ccc::ccc() 里面有 bbb 类的初始化 bbb::bbb(),因为类在被调用的时候需要进行初始化,进入 bbb::bbb()

//code-4
void __thiscall bbb::bbb(bbb *this)
{
  bbb *v1; // STD4_4

  v1 = this;
  aaa::aaa((aaa *)&this->vfptr);
  v1->vfptr = (aaaVtbl *)&bbb::`vftable';
  v1->num = 20;
  v1->mmmm = 2;
}
/*8-10行汇编如下:
.text:00413A1B                 mov     eax, [ebp+this]
.text:00413A1E                 mov     dword ptr [eax], offset ??_7bbb@@6B@ ; const bbb::`vftable'
.text:00413A24                 mov     eax, [ebp+this]
.text:00413A27                 mov     dword ptr [eax+0Ch], 14h
.text:00413A2E                 mov     eax, [ebp+this]
.text:00413A31                 mov     dword ptr [eax+10h], 2
*/

根据上面的代码表现,其实可以猜到 aaa::aaa() 里面的内容了,aaa::aaa() 代码如下:

//code-5
void __thiscall aaa::aaa(aaa *this)
{
  this->vfptr = (aaaVtbl *)&aaa::`vftable';
  this->num = 10;
  this->mmmm = 1;
}

到了这里,具体的思路已经是比较清晰的了,从上面的代码可以得到以下几点

  • 类初始化函数在初始化类的时候总是把虚表初始化放在首位
  • 虚表完成初始化后开始初始化类成员
  • 要初始化子类,必须先初始化父类,父类初始化完成后进行子类初始化

而从汇编代码可以看到,在最顶层的父类中,将虚表初始化之后将成员初始化,完成后开始初始化子类虚表和成员变量,因为虚表是实现多态的载体,所以在子类中的虚表是一直都被每一次的初始化更新覆盖的,覆盖的虚表就是对于目前的类最新的虚表,完成所有初始化后,子类的虚表就是最新的。而对于类成员,并不是直接覆盖,而是沿着父类初始化的内存接着初始化,给出调试时的表现验证这个想法

用 ida 进行动态调试,从类初始化函数开始跟踪↓

可以看到通过 new 申请的内存地址在 eax 中,然后将这个地址作为 this 指针传递进类的初始化函数↓

C++的习惯是 this 指针通过 ecx 传递,将 this 再次作为参数传递进父类的初始化函数↓

继续调用到顶层的父类初始化函数,函数先初始化虚表,再初始化成员变量↓

父类初始化完成后对子类进行初始化,初始化的顺序相同↓

同样通过相同的步骤初始化类

从上面的流程可以看到每次初始化都会覆盖虚表的地址,但是成员变量却是沿着内存一直往下面存,而三个类初始化的虚表如下

  • aaa
.rdata:0041CC98 ??_7aaa@@6B@    dd offset j_?ppp@aaa@@UAEXXZ
.rdata:0041CC98                                         ; DATA XREF: aaa::aaa(void)+26↑o
.rdata:0041CC98                                         ; aaa::ppp(void)
.rdata:0041CC9C                 dd offset j_?eee@aaa@@UAEXXZ ; aaa::eee(void)
.rdata:0041CCA0                 dd offset j_?www@aaa@@UAEXXZ ; aaa::www(void)
  • bbb
.rdata:0041CCDC ; const bbb::`vftable'
.rdata:0041CCDC ??_7bbb@@6B@    dd offset j_?ppp@bbb@@UAEXXZ
.rdata:0041CCDC                                         ; DATA XREF: bbb::bbb(void)+2E↑o
.rdata:0041CCDC                                         ; bbb::ppp(void)
.rdata:0041CCE0                 dd offset j_?eee@bbb@@UAEXXZ ; bbb::eee(void)
.rdata:0041CCE4                 dd offset j_?www@aaa@@UAEXXZ ; aaa::www(void)
.rdata:0041CCE8                 dd offset j_?zzz@bbb@@UAEXXZ ; bbb::zzz(void)
  • ccc
.rdata:0041DA9C ??_7ccc@@6B@    dd offset j_?ppp@ccc@@UAEXXZ
.rdata:0041DA9C                                         ; DATA XREF: ccc::ccc(void)+2E↑o
.rdata:0041DA9C                                         ; ccc::ppp(void)
.rdata:0041DAA0                 dd offset j_?eee@ccc@@UAEXXZ ; ccc::eee(void)
.rdata:0041DAA4                 dd offset j_?www@aaa@@UAEXXZ ; aaa::www(void)
.rdata:0041DAA8                 dd offset j_?zzz@bbb@@UAEXXZ ; bbb::zzz(void)
.rdata:0041DAAC                 dd offset j_?vvv@ccc@@UAEXXZ ; ccc::vvv(void)

可以看到每一层的底层子类的虚表函数都体现了多态的特性


而对于多继承的情况,内存的空间的初始化和单继承的类似,只是分布会有一些变化,下面引用看雪上的一篇文章的部分内容粗略提一下,同时也可以阅读一下这篇文章从另一个角度来理解这个流程

//代码源自 https://bbs.pediy.com/thread-253429.htm
#include<cstdio>
#include<Windows.h>
class CBase1
{
public:
    int m_nBase1;
public:
    CBase1() { m_nBase1 = 1; }
    ~CBase1() {}
    virtual void Function1() { printf("CBase1::Function1"); }
    virtual void Function2() { printf("CBase1::Function2"); }
};

class CBase2
{
public:
    int m_nBase2;
public:
    CBase2() { m_nBase2 = 2; }
    ~CBase2() {}
    virtual void Function3() { printf("CBase2::Function3"); }
    virtual void Function4() { printf("CBase2::Function4"); }
};

class ClassA :public CBase1, public CBase2
{
public:
    int m_nClassA;
public:
    ClassA() :CBase1(), CBase2(){ m_nClassA = 3; }
    ~ClassA() {}
    virtual void Function1() { printf("ClassA::Function1"); }
    virtual void Function3() { printf("ClassA::Function3"); }
    virtual void Function5() { printf("ClassA::Function5"); }
};

class ClassB :public ClassA
{
public:
    int m_nClassB;
public:
    ClassB() :ClassA() { m_nClassB = 4; }
    ~ClassB() {}
    virtual void Function5() { printf("ClassB::Function5"); }
    virtual void Function6() { printf("ClassB::Function6"); }
};
typedef  void(*pFunction)();

int main()
{

    CBase1 objCBase1;
    CBase2 objCBase2;
    ClassA objClassA;
    ClassB objClassB;
    DWORD*  m_dwCBase1_table = (DWORD*)*(DWORD*)&objCBase1;
    DWORD*  m_dwCBase2_table = (DWORD*)*(DWORD*)&objCBase2;
    DWORD*  m_dwClassA_table1 = (DWORD*)*(DWORD*)&objClassA;
    DWORD*  m_dwClassA_table2 = (DWORD*)*((DWORD*)&objClassA + 2);
    DWORD*  m_dwClassB_table1 = (DWORD*)*(DWORD*)&objClassB;
    DWORD*  m_dwClassB_table2 = (DWORD*)*((DWORD*)&objClassB + 2);


    printf("CBase1_VFT:\t0x%08x", m_dwCBase1_table);
    printf("\r\n\t\t0x%08x ", *(m_dwCBase1_table));     ((pFunction)*(m_dwCBase1_table))();
    printf("\r\n\t\t0x%08x ", *(m_dwCBase1_table + 1));  ((pFunction)*(m_dwCBase1_table + 1))();
    printf("\r\n");

    printf("\r\nCBase2_VFT:\t0x%08x", m_dwCBase2_table);
    printf("\r\n\t\t0x%08x ", *(m_dwCBase2_table));     ((pFunction)*(m_dwCBase2_table))();
    printf("\r\n\t\t0x%08x ", *(m_dwCBase2_table + 1)); ((pFunction)*(m_dwCBase2_table + 1))();
    printf("\r\n");

    printf("\r\nClassA_VFT1:\t0x%08x", m_dwClassA_table1);
    printf("\r\n\t\t0x%08x ", *(m_dwClassA_table1));    ((pFunction)*(m_dwClassA_table1))();
    printf("\r\n\t\t0x%08x ", *(m_dwClassA_table1 + 1));    ((pFunction)*(m_dwClassA_table1 + 1))();
    printf("\r\n\t\t0x%08x ", *(m_dwClassA_table1 + 2));    ((pFunction)*(m_dwClassA_table1 + 2))();
    printf("\r\nClassA_VFT2:\t0x%08x", m_dwClassA_table2);
    printf("\r\n\t\t0x%08x ", *(m_dwClassA_table2));    ((pFunction)*(m_dwClassA_table2))();
    printf("\r\n\t\t0x%08x ", *(m_dwClassA_table2 + 1));    ((pFunction)*(m_dwClassA_table2 + 1))();
    printf("\r\n");

    printf("\r\nClassB_VFT1:\t0x%08x", m_dwClassB_table1);
    printf("\r\n\t\t0x%08x ", *(m_dwClassB_table1));    ((pFunction)*(m_dwClassB_table1))();
    printf("\r\n\t\t0x%08x ", *(m_dwClassB_table1 + 1));    ((pFunction)*(m_dwClassB_table1 + 1))();
    printf("\r\n\t\t0x%08x ", *(m_dwClassB_table1 + 2));    ((pFunction)*(m_dwClassB_table1 + 2))();
    printf("\r\n\t\t0x%08x ", *(m_dwClassB_table1 + 3));    ((pFunction)*(m_dwClassB_table1 + 3))();
    printf("\r\nClassA_VFT2:\t0x%08x", m_dwClassB_table2);
    printf("\r\n\t\t0x%08x ", *(m_dwClassB_table2));    ((pFunction)*(m_dwClassB_table2))();
    printf("\r\n\t\t0x%08x ", *(m_dwClassB_table2 + 1));    ((pFunction)*(m_dwClassB_table2 + 1))();

    printf("\r\n");
}

编译后将可执行文件放入 IDA 分析,定位主函数,得到如下内容:

__int64 __cdecl main()
{
  int v0; // edx
  __int64 v1; // ST00_8
  unsigned int *m_dwClassB_table2; // [esp+D4h] [ebp-ACh]
  unsigned int *m_dwClassB_table1; // [esp+E0h] [ebp-A0h]
  unsigned int *m_dwClassA_table2; // [esp+ECh] [ebp-94h]
  unsigned int *m_dwClassA_table1; // [esp+F8h] [ebp-88h]
  unsigned int *m_dwCBase2_table; // [esp+104h] [ebp-7Ch]
  unsigned int *m_dwCBase1_table; // [esp+110h] [ebp-70h]
  ClassB objClassB; // [esp+11Ch] [ebp-64h]
  ClassA objClassA; // [esp+13Ch] [ebp-44h]
  CBase2 objCBase2; // [esp+158h] [ebp-28h]
  CBase1 objCBase1; // [esp+168h] [ebp-18h]
  int v13; // [esp+17Ch] [ebp-4h]

  CBase1::CBase1(&objCBase1);
  v13 = 0;
  CBase2::CBase2(&objCBase2);
  LOBYTE(v13) = 1;
  ClassA::ClassA(&objClassA);
  LOBYTE(v13) = 2;
  ClassB::ClassB(&objClassB);
  LOBYTE(v13) = 3;
  m_dwCBase1_table = (unsigned int *)objCBase1.vfptr;
  m_dwCBase2_table = (unsigned int *)objCBase2.vfptr;
  m_dwClassA_table1 = (unsigned int *)objClassA.vfptr;
  m_dwClassA_table2 = (unsigned int *)objClassA.vfptr;
  m_dwClassB_table1 = (unsigned int *)objClassB.vfptr;
  m_dwClassB_table2 = (unsigned int *)objClassB.vfptr;
  _printf("CBase1_VFT:\t0x%08x", objCBase1.vfptr);
  _printf("\r\n\t\t0x%08x ", *m_dwCBase1_table);
  ((void (*)(void))*m_dwCBase1_table)();
  _printf("\r\n\t\t0x%08x ", m_dwCBase1_table[1]);
  ((void (*)(void))m_dwCBase1_table[1])();
  _printf("\r\n");
  _printf("\r\nCBase2_VFT:\t0x%08x", m_dwCBase2_table);
  _printf("\r\n\t\t0x%08x ", *m_dwCBase2_table);
  ((void (*)(void))*m_dwCBase2_table)();
  _printf("\r\n\t\t0x%08x ", m_dwCBase2_table[1]);
  ((void (*)(void))m_dwCBase2_table[1])();
  _printf("\r\n");
  _printf("\r\nClassA_VFT1:\t0x%08x", m_dwClassA_table1);
  _printf("\r\n\t\t0x%08x ", *m_dwClassA_table1);
  ((void (*)(void))*m_dwClassA_table1)();
  _printf("\r\n\t\t0x%08x ", m_dwClassA_table1[1]);
  ((void (*)(void))m_dwClassA_table1[1])();
  _printf("\r\n\t\t0x%08x ", m_dwClassA_table1[2]);
  ((void (*)(void))m_dwClassA_table1[2])();
  _printf("\r\nClassA_VFT2:\t0x%08x", m_dwClassA_table2);
  _printf("\r\n\t\t0x%08x ", *m_dwClassA_table2);
  ((void (*)(void))*m_dwClassA_table2)();
  _printf("\r\n\t\t0x%08x ", m_dwClassA_table2[1]);
  ((void (*)(void))m_dwClassA_table2[1])();
  _printf("\r\n");
  _printf("\r\nClassB_VFT1:\t0x%08x", m_dwClassB_table1);
  _printf("\r\n\t\t0x%08x ", *m_dwClassB_table1);
  ((void (*)(void))*m_dwClassB_table1)();
  _printf("\r\n\t\t0x%08x ", m_dwClassB_table1[1]);
  ((void (*)(void))m_dwClassB_table1[1])();
  _printf("\r\n\t\t0x%08x ", m_dwClassB_table1[2]);
  ((void (*)(void))m_dwClassB_table1[2])();
  _printf("\r\n\t\t0x%08x ", m_dwClassB_table1[3]);
  ((void (*)(void))m_dwClassB_table1[3])();
  _printf("\r\nClassA_VFT2:\t0x%08x", m_dwClassB_table2);
  _printf("\r\n\t\t0x%08x ", *m_dwClassB_table2);
  ((void (*)(void))*m_dwClassB_table2)();
  _printf("\r\n\t\t0x%08x ", m_dwClassB_table2[1]);
  ((void (*)(void))m_dwClassB_table2[1])();
  _printf("\r\n");
  LOBYTE(v13) = 2;
  ClassB::~ClassB(&objClassB);
  LOBYTE(v13) = 1;
  ClassA::~ClassA(&objClassA);
  LOBYTE(v13) = 0;
  CBase2::~CBase2(&objCBase2);
  v13 = -1;
  CBase1::~CBase1(&objCBase1);
  HIDWORD(v1) = v0;
  LODWORD(v1) = 0;
  return v1;
}

因为这里的类实例是定义的局部变量,并不是用 new 申请的,所以没有申请的过程,函数开头的初始化中通过了 sub esp, 164h申请栈空间

直接进入单继承了多继承类的类 ClassB 的初始化函数,得到下面内容:

void __thiscall ClassB::ClassB(ClassB *this)
{
  ClassB *v1; // STD4_4

  v1 = this;
  ClassA::ClassA((ClassA *)&this->vfptr);
  v1->vfptr = (CBase1Vtbl *)&ClassB::`vftable'{for `CBase1'};
  v1->vfptr = (CBase2Vtbl *)&ClassB::`vftable'{for `CBase2'};
  v1->m_nClassB = 4;
}
//7-9行汇编如下:
.text:0041322B                 mov     eax, [ebp+this]
.text:0041322E                 mov     dword ptr [eax], offset ??_7ClassB@@6BCBase1@@@ ; const ClassB::`vftable'{for `CBase1'}
.text:00413234                 mov     eax, [ebp+this]
.text:00413237                 mov     dword ptr [eax+8], offset ??_7ClassB@@6BCBase2@@@ ; const ClassB::`vftable'{for `CBase2'}
.text:0041323E                 mov     eax, [ebp+this]
.text:00413241                 mov     dword ptr [eax+14h], 4

进入 ClassA 的初始化函数

void __thiscall ClassA::ClassA(ClassA *this)
{
  ClassA *v1; // STD4_4

  v1 = this;
  CBase1::CBase1((CBase1 *)&this->vfptr);
  CBase2::CBase2((CBase2 *)&v1->vfptr);
  v1->vfptr = (CBase1Vtbl *)&ClassA::`vftable'{for `CBase1'};
  v1->vfptr = (CBase2Vtbl *)&ClassA::`vftable'{for `CBase2'};
  v1->m_nClassA = 3;
}
//8-10行汇编如下
.text:0041318C                 mov     eax, [ebp+this]
.text:0041318F                 mov     dword ptr [eax], offset ??_7ClassA@@6BCBase1@@@ ; const ClassA::`vftable'{for `CBase1'}
.text:00413195                 mov     eax, [ebp+this]
.text:00413198                 mov     dword ptr [eax+8], offset ??_7ClassA@@6BCBase2@@@ ; const ClassA::`vftable'{for `CBase2'}
.text:0041319F                 mov     eax, [ebp+this]
.text:004131A2                 mov     dword ptr [eax+10h], 3

可以看到初始化过程和单继承的非常类似,不过需要注意一个地方有所不同,两个虚表之间是隔了 +8 的内存,dword ptr [eax]dword ptr [eax+8],程序是 32 位的,所以中间还隔着一个 4 字节的内存,进入 CBase1 可以看到如下内容

void __thiscall CBase1::CBase1(CBase1 *this)
{
  this->vfptr = (CBase1Vtbl *)&CBase1::`vftable';
  this->m_nBase1 = 1;
}

可以看到两个this->vfptr之间是隔着一个this->m_nBase1变量的,而this->m_nBase1变量刚好是 int 型,到这里其实又可以猜测多继承时候内存的分布情况,下面直接引用看雪那篇文章的一个图来描述

可以从图中看到详细的空间分布

#2

1 thought on “C++ 类&虚表的初始化流程

发表评论

电子邮件地址不会被公开。 必填项已用*标注