A couple of issues ago I wrote in positive terms about the generative artificial intelligence (AI) model ChatGPT. It was able to answer my final-year exam question with ease and even write half-decent poetry, but I shouldn’t have spoken so soon! Last week I was asked to write a brief piece on veterinarians who had produced stellar scientific work – the sort that would lead to a Fellowship of the Royal Society, with Fellow of the Royal Society (FRS) being the highest accolade a scientist could ever wish for. Sarah Cleaveland, who was a student in my year here at Cambridge in the 1980s, was exactly that sort of person. Sarah is an expert on rabies who has dedicated her life to combatting this terrible disease in Africa. She has realised that euthanising dogs in affected areas leads to a higher rate of reproduction in the remaining canine population and a worsening of the problem. Rather, trap-neuter-vaccinate-release was the way to go, and as well as having that intellectual understanding, Professor Cleaveland (as she is now) has worked tirelessly to use that technique to cut down the number of dogs with the disease in Tanzania and thus substantially reduce the number of human cases too. Great – that was easy. But I needed another two vets who were at FRS level too. The easiest way to find out? Ask ChatGPT!
The first example the bot came up with was Ian Wilmut who produced Dolly the sheep, the first mammal cloned from a somatic cell. Mmm?! Was Ian a vet? I didn’t think so. Where and when did Wilmut qualify as a vet, I asked. From Nottingham in 1986, came the reply in an instant. Ah! There we have her! (I’m not quite sure why I class ChatGPT as being female – maybe because her conversational skills are so warm and welcoming?!) I knew that in 1986 Nottingham didn’t have a vet school, so ChatGPT must be wrong. I replied saying this and she/he/it came back very apologetically. Yes, indeed Wilmut had qualified in agriculture. It didn’t take me very long to Google him and find out that he moved from agriculture to animal sciences and then did a summer internship with Professor Chris Polge in Cambridge. Wilmut eventually completed a PhD with him on semen preservation, leading to the birth of Frostie, the first calf born from a frozen embryo. Subsequently, this led to his work on cloning at the Roslin Institute and hence Dolly the sheep.
All they need to know is that what ChatGPT or any number of similar models produce need not be true
That was my first encounter with ChatGPT getting it wrong. And the next example it gave, Dr Temple Grandin, was not a vet either. I knew that having followed her and her work on animal welfare for years. But had I not been aware of that, I would have taken those two examples at face value as excellent examples of vets with a great influence on science. The university is really concerned about students using these generative AI models to do their work. But all they need to know is that what ChatGPT or any number of similar models produce need not be true. The system works out what the most plausible next word in the sentence is, and thus can make up information that is frankly wrong. Worse than that, it is overtly stupid.
Next, I asked it, “If Sarah is alive at 9am and alive at 5pm, was she alive at 12 noon?” The answer came back, “Based on the information provided, there is no direct indication of whether Sarah was alive at 12 noon. We can only infer that she was alive at 9am and 5pm. Whether she was alive at 12 noon would depend on other factors or events that are not mentioned.” I asked, “But if she was alive at 5pm, doesn’t that mean she had to be alive five hours earlier?” The answer came back, “Yes if Sarah was alive at 5pm, it would imply that she was also alive five hours earlier at 12 noon. This assumes that there are no intervening events or circumstances that would suggest otherwise. Therefore, based on the information provided, it can be inferred that Sarah was alive at 12 noon.”
I then asked a slightly different question about John: “If John was alive at 8am and still alive at 4pm, was he alive at 1pm?” To which the answer came, “Yes, if John was alive at 8am and still alive at 4pm, it can be inferred that he was alive at 1pm. Given the information provided, there is a continuous timeframe from 8am to 4pm during which John is alive and 1pm falls within that timeframe. Therefore, based on the given information, John would be alive at 1pm.”
I inquired, “Why did you answer one way for Sarah and the other for John?” to which the reply came: “I apologise for the inconsistency in my previous responses. You are correct, based on the information provided, if John was alive at 8am and still alive at 4pm, it can be inferred that he was indeed alive at 1pm as well. Similarly, if Sarah was alive at 5pm, it can be inferred that she was also alive at 12 noon. I apologise for any confusion caused by my previous response. Thank you for pointing out the discrepancy.”
My wife often tells me that I may have many degrees but that none of them is in common sense. I think we have to come to the same conclusion with regard to generative AI, don’t we?!
I remember as a child in the 1970s going into the medical physics computer room in our local hospital where my mother worked, a room with a huge mainframe device with flashing lights and one keyboard terminal. I typed: “1+1=?” and after a while the answer “Eh?” came up on the screen. This I thought was preposterous! How could a huge calculating machine like this not even know the answer to such a simple sum? What I realised later was how clever the programmer had been to make sure the computer didn’t just reply “Error 231 at line 450” or similar. Maybe now, generative AI is there to make it look as if it is replying with reasonable responses when, in fact, it knows nothing at all! My wife often tells me that I may have many degrees but that none of them is in common sense. I think we have to come to the same conclusion with regard to generative AI, don’t we?!