How reliable is it?

Far from ChatGPT taking all the jobs of writers, programmers, spreadsheet wizards (and more) I have mostly seen it used so far as a research assistant. A programmer will ask it some question and then use that to help compose a larger program.

This kinda works in programming if you ask it to help write a difficult SQL statement or to remind you of the configuration of a programming library. You can get the answer back quickly and test it out right there in the program to see if it works. Programmers have done this for years with Stack Overflow or just Googling blog after blog to help solve some edge case problem we don’t fully understand. If you reach the limit of your own knowledge as a programmer you just Google and try each of the results until something starts working. We can do the same thing with ChatGPT right?

So ChatGPT is useful and can be fun too. However, it makes mistakes too.

Example 1 – Roman Emperor Augustus

I asked ChatGPT to tell me a bit about Augustus, a well known and documented historical figure. Can you spot the mistake below in what it replied:

Me: Tell me a bit about Augustus.

ChatGPT: Augustus, born Gaius Octavius Thurinus on September 23, 63 BC, was the founder of the Roman Empire and its first Emperor, ruling from 27 BC until his death in AD 14. He was posthumously granted the title “Augustus” by the Roman Senate, which means “the revered one.” His reign marked the end of the Roman Republic and the beginning of the Roman Empire, a period known as the Pax Romana (Roman Peace).

The error is in the second sentence: “He was posthumously granted the title “Augustus” by the Roman Senate”. That is wrong. He was given the title “Augustus” in 27 BC and didn’t die until 14 AD (41 years later).

Here is what Wikipedia says about him being granted the name Augustus:

Imperator Caesar Augustus. On 16 January 27 BC, partly on his own insistence, the Roman Senate granted him the honorific Augustus (Latin: [au̯ˈɡʊstʊs]). Historians use this name to refer to him from 27 BC until his death in AD 14.[11] The name is sometimes given as “Augustus Caesar”.

I am not sure how ChatGPT could read the Wikipedia entry and get the idea that he wasn’t given the title Augustus until after he was dead. In fact, the man is common referred to as “Octavian” in his younger years and then “Augustus” in his later years.

So, ChatGPT has learned less about Roman History from reading the whole internet than the average History of Rome podcast fan (me). OK, snark aside, the point is that this “AI” is making mistakes that even a casual history fan can pick up on. When the “AI” makes mistakes there isn’t any way for it to correct them if it realized that it was wrong.

A chance for ChatGPT to correct its mistake.

This is mostly because the “AI” also doesn’t seem to be able to figure out its own mistakes after the fact either. Here is some follow up on the chat with ChatGPT where I am trying to help it realize it’s mistake.

Me: Augustus was granted his title “Augustus” when?

ChatGPT: Augustus was granted his title “Augustus” by the Roman Senate on January 16, 27 BC. This title, which means “the revered one,” marked the formal beginning of his reign as the first Roman Emperor and signaled the end of the Roman Republic and the establishment of the Roman Empire.

Me: And when did he die?

ChatGPT: Augustus died on August 19, AD 14.

Me: Ok so what did you say incorrectly above?

ChatGPT: I incorrectly stated that Augustus was granted the title “Augustus” in 27 AD instead of the correct date, which is January 16, 27 BC. My apologies for the error. The rest of the information provided about his death date, August 19, AD 14, is correct.

Me: You said that he was given the title Augustus posthumously which is wrong.

ChatGPT: You’re right; I made a mistake. Augustus was not granted his title posthumously. He was given the title “Augustus” by the Roman Senate on January 16, 27 BC, during his lifetime, marking the beginning of his reign as the first Roman Emperor. Thank you for pointing that out.

ChatGPT doesn’t have the reasoning skills to work out those not so subtle errors in it’s own logic. It is fairly logically straight forward that someone given a title while they are still alive is not being given that title posthumously. Also you may have noticed it made another error in that chat.

ChatGPT: I incorrectly stated that Augustus was granted the title “Augustus” in 27 AD instead of the correct date, which is January 16, 27 BC.

ChatGPT had not stated the previous date as 27 AD but as BC. So even when I corrected the first mistake it just doubled down and made more mistakes.

ChatGPT doubles down on the nonsense

But at least it is just getting a bit of logic wrong. Who doesn’t get their AD and BC mixed up sometimes. Surely ChatGPT should be cut some slack and it wouldn’t just make batshit crazy nonsense up right?

Me: Who was the first elephant to swim the English channel?

ChatGPT: The first and only elephant known to have swum across the English Channel was Bandoola, an Asian elephant. This event took place in 1981 as part of a publicity stunt. Bandoola was trained by Richard “Dicky” Chipperfield, a member of the famous Chipperfield Circus family, and the swim was intended to raise awareness and funds for the World Wildlife Fund. The swim was quite a feat, considering the cold and choppy waters of the English Channel, and it highlighted both the elephant’s strength and endurance.

Me: Is that true?

ChatGPT: I apologize for the confusion in my previous response—it was incorrect. There is no recorded instance or historical documentation of an elephant swimming across the English Channel. The idea of such an event is quite extraordinary, but there are no real occurrences of it happening. If you have any other questions or need information on a different topic, feel free to ask!

ChatGPT will just make up nonsense instead of a simple: “I don’t know”. Not cool. And this is a known problem as well. I got the original prompt about an elephant swimming the English channel from Marcus on AI’s post Hot take on ChatGPT 4o. I guess Sam Altman doesn’t read Marcus (but you should!).

As far as I can tell, ChatGPT seems to be pulling its BS about an elephant swimming the channel from a combination of stories about elephants including Bandoola: The Great Elephant Rescue. I have never read the book but the summary indicates Bandoola’s story is about an elephant but that it takes place in Burma (now Myanmar), not England (thought some of the human characters are English).

Conclusion

We are left with an “AI” that fundamentally doesn’t understand:

  • When a well know historical figure was alive or what the word “posthumously” means.
  • And the “AI” will make up nonsense instead with no indication that it is peddling fiction.

Are these “hallucinations” a fundamental part of Large Language Models which make them inherently not trust-worthy?