How Virtual Methods Work
I got into a discussion of this at work, so I thought that I might as well put a full explanation in the blog. I should preface this post by saying that everything that I says here is a lie. It doesn't really work like this. The compiler, the JIT, the OS and the CPU are doing a lot of extra stuff that you generally do not need to concern yourself with. It is (more or less) a high level overview of the way the low level stuff works :-)
This post is probably redundant for anyone who did non-trivial work in C, but I think that there are enough people who didn't get to programming through C, so this is worthwhile.
Let us start with the simplest possible beginning, here is a view of your RAM, it is containing binary data, all in ones and zeroes.
At this level, there isn't such a thing as a class or an int, it is just a bunch of bits. Now, when you compile the code, the compiler is turning the source code into instructions (bits again) that the CPU can execute. How do this looks to the cpu?
Usually, it will turn into this:
The orange part is all the member varaibles in the class (the data) and the purple part is the compiled instructions of this class. When the compiler sees a class method call, it outputs:
push 0x34258FA #address of test
call 0x484325DA #the address of test
Those values are hard coded into the generated exe by the compiler, by the way, so it is really efficent to make a method call. Now, here comes OOP with the need to use virtual methods. How do they work? Remember, the CPU has no concept of class or methods, all it can do is to jump to some location in memory and start executing whatever it finds there.
Let us take a look at this code (where DoSomething() is a virtual method call):
baseObj.DoSomething();
Now, the compiler has a problem. DoSomething is a virtual method call, so it needs to output code to figure out which method to call. Remember that there is no Reflection on the CPU.
The compiler solves this problem by generating a method table for all the virtual method calls:
Base method table:
0x3423432 DoSomething
0xDAF2343 DoOtherThing
Derived method table:
0x3423432 DoSomething
0x24AB43 DoOtherThing
Note that we "overrode" DoOtherThing in Derived, because the method table for derived is pointing at another method.
- Blue: Base method table
- Yellow: Derived method table
- Orange: the data in Derived (which include all the data from Base, of course).
- Purple: Base methods,
- Red: Derived methods.
You should note that the method table and the methods themselves are different. The method table simply contains reference to the methods for this class, not the methods themselves.
Now that we have all of this setup, how does the compiler do a virtual method call? When we use virtual methods, the compiler generates an additional field to the class, which points to the method table, then we can do this:
# I'll spare you the assembly, and use C-like code here to figure out what the method address is.
# this simply goes to the object method table (in thes case, the yellow) and get the address of the method to call by its method table index.
jump derivedObj.MethodTable[ 3 ]
The extra part, figuring out where the method to call actuall resides, is something that makes virtual method calls slightly less efficent than non-virtual method calls. And before you do a search replace to remove all virtuals from your code, take into account that in C#, all method calls are virtual method calls, even if the method is not virtual. The JIT is usually able to remove the last bit of indirection in most cases, so this is not an issue.
The "public new void DoSomething" syntax in C# is merely creating a new entry in the method table for the object, a lesson from C++, where this is not possible.
Like I said, there is a lot more that is going on,
Comments
Comment preview