A Database File
When I was 10 years old, and going through a fairly difficult time, I was lucky enough to come into the possession of a piece of software called Claris FileMaker Pro™.
FileMaker allowed its users to construct arbitrary databases, and to associate their tables with a customized visual presentation. FileMaker also had a rudimentary scripting language, which would allow users to imbue these databases with behavior.
As a mentally ill pre-teen, lacking a sense of control over anything or anyone in my own life, including myself, I began building a personalized database to catalogue the various objects and people in my immediate vicinity. If one were inclined to be generous, one might assess this behavior and say I was systematically taxonomizing the objects in my life and recording schematized information about them.
As I saw it at the time, if I collected the information, I could always use it later, to answer questions that I might have. If I didn’t collect it, then what if I needed it? Surely I would regret it! Thus I developed a categorical imperative to spend as much of my time as possible collecting and entering data about everything that I could reasonably arrange into a common schema.
Having thus summoned this specter of regret for all lost data-entry opportunities, it was hard to dismiss. We might label it “Claris’s Basilisk”, for obvious reasons.
Therefore, a less-generous (or more clinically-minded) observer might have replaced the word “systematically” with “obsessively” in the assessment above.
I also began writing what scripts were within my marginal programming abilities at the time, just because I could: things like computing the sum of every street number of every person in my address book. Why was this useful? Wrong question: the right question is “was it possible” to which my answer was “yes”.
If I was obliged to collect all the information which I could observe — in case it later became interesting — I was similarly obliged to write and run every program I could. It might, after all, emit some other interesting information.
I was an avid reader of science fiction as well.
I had this vague sense that computers could kind of think. This resulted in a chain of reasoning that went something like this:
- human brains are kinda like computers,
- the software running in the human brain is very complex,
- I could only write simple computer programs, but,
- when you really think about it, a “complex” program is just a collection of simpler programs
Therefore: if I just kept collecting data, collecting smaller programs that could solve specific problems, and connecting them all together in one big file, eventually the database as a whole would become self-aware and could solve whatever problem I wanted. I just needed to be patient; to “keep grinding” as the kids would put it today.
I still feel like this is an understandable way to think — if you are a highly depressed and anxious 10-year-old in 1990.
Anyway.
35 Years Later
OpenAI is a company that produces transformer architecture machine learning generative AI models; their current generation was trained on about 10 trillion words, obtained in a variety of different ways from a large variety of different, unrelated sources.
A few days ago, on March 26, 2025 at 8:41 AM Pacific Time, Sam Altman took to “X™, The Everything App™,” and described the trajectory of his career of the last decade at OpenAI as, and I quote, a “grind for a decade trying to help make super-intelligence to cure cancer or whatever” (emphasis mine).
I really, really don’t want to become a full-time AI skeptic, and I am not an expert here, but I feel like I can identify a logically flawed premise when I see one.
This is not a system-design strategy. It is a trauma response.
You can’t cure cancer “or whatever”. If you want to build a computer system that does some thing, you actually need to hire experts in that thing, and have them work to both design and validate that the system is fit for the purpose of that thing.
Aside: But... are they, though?
I am not an oncologist; I do not particularly want to be writing about the specifics here, but, if I am going to make a claim like “you can’t cure cancer this way” I need to back it up.
My first argument — and possibly my strongest — is that cancer is not cured.
QED.
But I guess, to Sam’s credit, there is at least one other company partnering with OpenAI to do things that are specifically related to cancer. However, that company is still in a self-described “initial phase” and it’s not entirely clear that it is going to work out very well.
Almost everything I can find about it online was from a PR push in the middle of last year, so it all reads like a press release. I can’t easily find any independently-verified information.
A lot of AI hype is like this. A promising demo is delivered; claims are made that surely if the technology can solve this small part of the problem now, within 5 years surely it will be able to solve everything else as well!
But even the light-on-content puff-pieces tend to hedge quite a lot. For example, as the Wall Street Journal quoted one of the users initially testing it (emphasis mine):
The most promising use of AI in healthcare right now is automating “mundane” tasks like paperwork and physician note-taking, he said. The tendency for AI models to “hallucinate” and contain bias presents serious risks for using AI to replace doctors. Both Color’s Laraki and OpenAI’s Lightcap are adamant that doctors be involved in any clinical decisions.
I would probably not personally characterize “‘mundane’ tasks like paperwork and … note-taking” as “curing cancer”. Maybe an oncologist could use some code I developed too; even if it helped them, I wouldn’t be stealing valor from them on the curing-cancer part of their job.
Even fully giving it the benefit of the doubt that it works great, and improves patient outcomes significantly, this is medical back-office software. It is not super-intelligence.
It would not even matter if it were “super-intelligence”, whatever that means, because “intelligence” is not how you do medical care or medical research. It’s called “lab work” not “lab think”.
To put a fine point on it: biomedical research fundamentally cannot be done entirely by reading papers or processing existing information. It cannot even be done by testing drugs in computer simulations.
Biological systems are enormously complex, and medical research on new therapies inherently requires careful, repeated empirical testing to validate the correspondence of existing research with reality. Not “an experiment”, but a series of coordinated experiments that all test the same theoretical model. The data (which, in an LLM context, is “training data”) might just be wrong; it may not reflect reality, and the only way to tell is to continuously verify it against reality.
Previous observations can be tainted by methodological errors, by data fraud, and by operational mistakes by practitioners. If there were a way to do verifiable development of new disease therapies without the extremely expensive ladder going from cell cultures to animal models to human trials, we would already be doing it, and “AI” would just be an improvement to efficiency of that process. But there is no way to do that and nothing about the technologies involved in LLMs is going to change that fact.
Knowing Things
The practice of science — indeed any practice of the collection of meaningful information — must be done by intentionally and carefully selecting inclusion criteria, methodically and repeatedly curating our data, building a model that operates according to rules we understand and can verify, and verifying the data itself with repeated tests against nature. We cannot just hoover up whatever information happens to be conveniently available with no human intervention and hope it resolves to a correct model of reality by accident. We need to look where the keys are, not where the light is.
Piling up more and more information in a haphazard and increasingly precarious pile will not allow us to climb to the top of that pile, all the way to heaven, so that we can attack and dethrone God.
Eventually, we’ll just run out of disk space, and then lose the database file when the family gets a new computer anyway.
Acknowledgments
Thank you to my patrons who are supporting my writing on this blog. Special thanks also to Ben Chatterton for a brief pre-publication review; any errors remain my own. If you like what you’ve read here and you’d like to read more of it, or you’d like to support my various open-source endeavors, you can support my work as a sponsor! Special thanks also to Itamar Turner-Trauring and Thomas Grainger for pre-publication feedback on this article; any errors of course remain my own.