I am not going to describe to you how PyPy actually works. Lucky for you, I'm not smart enough to do that. But I would like to help you all understand how PyPy could work, and hopefully demystify the whole idea.
The people who are smart enough to explain how PyPy actually works will do it over at the PyPy blog. At some level it's really quite straightforward, but this impression of straightforwardness is not conveyed well by posts with titles like "Optimizing Traces of the Flow Graph Language". In addition to being a Python interpreter in Python, PyPy is a mind-blowingly advanced exploration of the cutting-est cutting-edge compiler and runtime technology, which can make it seem complex. In fact, the fact that it's in Python is what lets it be so cutting-edge.
Most people with a formal computer science background are already familiar with the fairly generic nature of compilers, as well as the concept of a self-hosting compiler. If you do have that background, then that's all PyPy is: a self-hosting compiler. The same way GCC is written in C, PyPy is written in Python. When you strip away the advanced techniques, that's all that's there.
A lot of folks who are confused by PyPy's existence, though, I suspect don't have that background; many working programmers these days don't. Or if they do, they've forgotten it, because the practical implications of the CSS box model are so complex that they squeeze simpler ideas, like turing completeness and the halting problem, out of the average human brain. So here's the easier explanation.
A compiler is a program that turns a string (source code: your program text written in Python, C, Ruby, Java, or whatever) into some kind of executable code (bytecode or runtime interpreter operations or a platform-native executable).
Let's examine that last one, since it seems to be a sticking point for most folks. A platform-native executable is simply a bunch of bytes in a file. There's nothing magic about it. It's not even a particularly complex type of file. It's a packed binary file, not a text file, but so are PNGs and JPEGs, and few programmers find it difficult to believe that such files might be created by Python. The formats are standard and very long-lived and there are tons of tools to work with them. If you're curious, even Wikipedia has a good reference for the formats used by each popular platform.
As to Python being slower than C: once a program has been transformed into executable code, it doesn't matter how slow the process for translating it was: the running program is now just executable instructions for your CPU, so it doesn't matter that Python is slower than C, because it was just the compiler that was in Python, and by the time your program is running, the original Python has effectively vanished and all you're left with is your program executing.
(Actually, Python is faster
than C anyway, especially at producing
strings.)
For the sake of argument, assume that you know all the ins and outs of binary executable formats for different operating systems, and the machine code for various CPU architectures. The question you should really ask yourself is: if you have to write a program (a compiler) which translates one kind of string (source code) into another kind of string (a compiled program): would you rather write it in C or Python? What if the strings in question were a template document and an HTML page?
It shouldn't be surprising that PyPy is written in Python. For the same reasons that you might use Django templates and not
snprintf
for generating
your HTML, it's easier to use Python than C to generate compiled
code. This is why PyPy is at the forefront of so many advanced
techniques that are too sophisticated to cover in a quick article like this.
Since the compiler is written in a higher-level language, it can do
more advanced things, since lower-level concerns can be abstracted away,
just as they are in your own applications.