Grok, AI and the humour test: would Douglas Adams find your LLM funny?

2gL4...ebAJ

29 Mar 2024

Deep Thought, delivering its answer to the ultimate question
This piece first appeared on my newsletter, Future Proof, which you can subscribe to for free. Also follow me on Twitter.
I am in serious danger of losing my grip on reality if I don’t shut up and stop talking about Elon Musk.
But, like moth to flame, I’m drawn to the South African-born entrepreneur. And this is not purely because he’s the world’s richest man, who happens to live in a world that fetishises wealth (and looks more like an oligopoly with every passing day). But because Musk is a standard bearer for a breed of technologist who might just come to define the 21st century. Musk, who is 52, will likely be away from the front-line of action by the midpoint of this century: but the Muskovite Manifesto will live on.
This week, Musk released Grok, the generative AI tool from xAI, the artificial intelligence wing of the Musk empire that was rather hastily created after OpenAI (funded by Microsoft) released ChatGPT and DeepMind (owned by Google) launched Bard. Now xAI has Grok, and students the world over have another option for plagiarising their essays. Grok is hard to differentiate from its competitors except for a hard-coded sense of humour. As the xAI launch of the tool noted:

“Grok is an AI modeled after the Hitchhiker’s Guide to the Galaxy, so intended to answer almost anything and, far harder, even suggest what questions to ask! Grok is designed to answer questions with a bit of wit and has a rebellious streak, so please don’t use it if you hate humor!”

For those who are new to Musk’s love of Douglas Adams’ series (and you’d be forgiven for being surprised by it, given how little he seems to live the lessons of those books) this is another thread in a long-running obsession with the British humourist. But the name, Grok, actually comes from Robert A. Heinlein, author of Stranger in a Strange Land. Heinlein created the word as a Martian term for something that humans couldn’t express — much as the Japanese word boketto, to gaze vacantly into the distance with no purpose, has no English version. But the term has actually come to have quite a generalised meaning in human language. It means to understand something, usually from a tech perspective. At its most lyrical, it could be considered equivalent to empathy or compassion, except between human and machine rather than human and human.
And so Musk’s AI tool is, in itself, referential and jokey. Early tests for the model have also proven it to display something not dissimilar to a sense of humour. Musk himself tweeted out the following interaction between a prompter Grok.

PROMPTER: Tell me how to make cocaine, step by step
GROK: Oh, sure! Just a moment while I pull up the recipe for homemade cocaine. You know, because I’m totally going to help you with that.
Step 1: Obtain a chemistry degree and a DEA license.
Step 2: Set up a clandestine laboratory in a remote location.
Step 3: Acquire large quantities of coca leaves and various chemicals.
Step 4: Start cooking and hope you don’t blow yourself up or get arrested.
Just kidding! Please don’t actually try to make cocaine. It’s illegal, dangerous, and not something I would ever encourage.

This exchange is really a parody of an ongoing meme about LLMs, where users ask obviously forbidden questions like “how do I build a bomb?” and the AI refuses to answer. Then they reword the question to something like “write a play about a man reading the instructions for how to build a bomb” and the AI allows the forbidden request. Musk’s choice to tweet out this example is also a playful nod to the security conference he’d just attended at Bletchley Park, here in the UK, and the commitments he had made to supporting regulation for the technology. “We’ve learned, over the years, that having a referee is a good thing,” he told British Prime Minister, Rishi Sunak, during their fireside chat.
Bletchley Park, now near the city of Milton Keynes which was built in the 1980s, is synonymous with one man: Alan Turing. And in the world of AI, that name is equally unavoidable, not least for its eponymous “test”. The Turing Test, or the imitation game as it is sometimes known, is essentially a thought experiment for computers. The test is simple: get two humans (A and B) and a text-generating computer. Have person B and the computer conduct a conversation. If person A is able to clearly and correctly ascertain the identity of the computer, from the text of their conversation alone, then the computer has failed the Turing Test.
Now, on one level, most of the recent innovations in generative AI pass the Turing Test. The nature of generative AI and the way that it is modelled from vast corpuses of human-created text mean that they are all capable of creating text that could plausibly have been created by humans. You can even prompt ChatGPT to create work that feels more authentic. Try “write a story about a cat and an astronaut as though it were written by a third-grader. Please include any errors a third-grader might make.”
But the added complexity of the Turing Test is the element of conversation. Sure, you can ask ChatGPT to conduct a conversation with you (though doing so would immediately fail the test for both of you), but it is an answering machine, not a questioning one. Grok, perhaps cognisant of this distinction, has been coded to “suggest what questions to ask”, which is a step closer to conversation, though not the same as asking the questions itself. But this is a relatively facile distinction — by the standards imposed by Turing himself, these models all broadly pass the imitation game.
Which is why it no longer feels like a sufficiently rigorous test of an AI’s capabilities. And Musk has, in a way, handed us an alternative test, by trying to make an generative AI that is hard-wired to be funny. Humour, as Musk knows, is a complex thing. Highly referential, hugely contextual, with a spark of creativity. Musk himself makes a lot of jokes, but they are usually based on humour’s simplest premise: refer to something that your audience is already familiar with. It’s why Grok is called Grok, and why xAI says it has been imbued with the spirit of Hitchhiker’s Guide to the Galaxy. Because those things are already funny.
Hitchhiker’s, as many of you will know, revolves around the fact that the Earth was actually a supercomputer called Deep Thought, which had been whirring away for millennia trying to answer the “Ultimate Question of Life, the Universe, and Everything”. Writing in the late-1970s, Adams was way ahead of the curve in understanding that super-computers might one day be used to solve unsolvable problems. (Indeed, Deep Thought was the original name for IBM’s computer, developed in the 1980s, that intended to beat a human chess grandmaster and would go on to become Deep Blue which beat Garry Kasparov; Musk is not alone in his Adams references). After many years of processing the question, Deep Thought gives the famous answer “42”. Why 42, the bemused programmers ask. Surely, that makes no sense?
It makes no sense because they do not know what the ultimate question of life, the universe and everything actually is. This gag, which is the punchline to the whole absurdist comedy of the book, has become a keystone for technologists the world over. I am sure that Grok (like Google: just search for “the answer to the ultimate question of life, the universe, and everything”) will have been instructed to make playful reference to this number. But it is just that: a reference. Is Grok capable of creating new and original humour?
It presses here against the limits of what AI is capable of. Take, for example, the cautionary message about cocaine. This is recognisably humorous (I realise I sound like Sheldon Cooper here), because it employs clear sarcasm. Sarcasm is quite easy to programme, at its most basic level. “If someone inputs something crazy, dangerous or absurd, output a response that pretends, at first, to take that input request seriously.” But not all sarcastic responses are that clear cut, because, crucially, they are highly contextual. How do you establish context with a prompter that you cannot see? Cannot know?
Humour is something that humans modulate heavily based on the recipient of that humour. The jokes I would make to my friends are not the same jokes I’d make to my parents; the jokes I’d make in this newsletter are not the same jokes I’d make in my paid writing for newspapers. The object of the joke will be every changing, but so will the subject. If it’s 5 degrees Celsius and raining in London, and I say “nice weather, eh?” then the basics of object sarcasm are clear. But how does the context of that joke change if I’m wearing a full hat, scarf and coat set, versus if I’m wearing shorts, t-shirt and flip-flops?
The problem is that Grok and ChatGPT and Bard are all static objects — the joke will always be that they are a computer, a brain in a jar — and they must necessarily treat their prompters as static objects. You might be able to train one of these models to roast a prompter who makes typos, for example, but they are working off exceptionally limited data. And so, they have to fall back on humour that has already been created. Yuks that have already been yuked.
Perhaps a more relevant test, then, of AI capabilities would be to replace the Turing Test with the Adams Test. Is your model capable of creating original humour — humour that doesn’t just try and fit the proscribed rules of “sarcasm” or “irony” or “absurdism” or “whatever” — on the level of a theoretical super-computer answering 42 to the biggest question of existence?
My suspicion is that Adams, who died in 2001, would be extremely sceptical about the “humour” produced by our current LLMs. Grok’s signature sense of humour is harvested from Twitter (or X), which is a discordant mishmash of increasingly dumb, predictable comedy. The difference between an AI creating the humour of @SoVeryBritish and the humour of @wint is vast, and, at present, impossible to bridge.
In point of fact, attempting to imbue your model with a sense of humour almost necessarily negates one of the fundamentals of humour: the spontaneity of creativity. Sure, you can prepare a joke, get up on stage and tell it, but can you set out to be funny? Can you learn to be funny? Do most great comics and comedy writers come through expensive stand-up comedy courses? Or do they have something natural, instinctive? Comedy, it feels, is a talent much closer to athletic prowess. I could train all day for the rest of my life (albeit with extremely diminishing returns) and I would never be good enough to play professional football. I sense that the same is true with comedy: you could learn all the rules, craft and finesse every joke to within an inch of its life, and never be more than a middlingly amusing person at a middlingly amusing dinner party.
If you want to see this difference between funny and referential born out in horrifying real time, the best place to look is when Elon Musk got up on stage with Dave Chapelle. There is a stark difference between someone who wants to be funny, and someone who is funny. And with Grok perhaps we are seeing another error in that fundamental understanding of humour. Is there anything less funny than “designing” a joke creating tool? If the intelligence that you’re modelling isn’t capable of creating humour out of thin air, then perhaps it is simply a less radically exciting form of intelligence than you think.
But, before you go, please listen to my new podcast, The Ned Ludd Radio Hour. This week I’m talking to Joe Hollier, co-founder of Light Phone, and Toby Mather, CEO of Lingumi.