5 Practical Limitations of LLMs
And how to mitigate them

Large Language Models (LLMs) are groundbreaking tools that have transformed fields ranging from customer service to creative writing. However, they are not without their flaws. Understanding these limitations is crucial for effectively utilizing LLMs and mitigating potential issues.
Today, we will explore five practical limitations of LLMs. Stay tuned until the end as we’ll be sharing links to some interesting failures at the end!
(1) Inaccurate Source Citation
One of the main challenges with LLMs is their inability to accurately cite sources. While LLMs can generate text that appears to reference sources and appears to make sense, they do not have access to real-time information or the ability to remember where their training data came from.
Consequently, they often produce results that seem plausible but are entirely fabricated. This is a significant limitation when using LLMs for tasks that require precise information.
Mitigation
Using a Retrieval Augmented Generation (RAG) pipeline with fully verified information can partially alleviate this issue. However, you should still verify the output as the LLM might hallucinate (see limitation #3 below). And of course, you need to make sure that the source information is always up-to-date. Garbage in, garbage out!
(2) Bias
LLMs can exhibit biases in their responses, often generating stereotypical or prejudiced content. This bias stems from the large datasets they are trained on, which may contain biased information.
Despite safeguards, LLMs can sometimes produce sexist, racist, or other biased content. This is a critical issue to be aware of, particularly in consumer-facing applications or research, as it can lead to the propagation of harmful stereotypes.
Mitigation
Continuous monitoring and fine-tuning of LLMs using diverse and balanced datasets can help reduce bias. Additionally, implementing filters and human oversight can minimize the impact of biased outputs.
(3) Hallucinations
A significant problem with LLMs is their tendency to “hallucinate” or generate false information when they do not know the answer to a question. Instead of admitting their lack of knowledge, LLMs often produce confident but incorrect responses.
Hallucinations can lead to the spread of misinformation, which is particularly concerning in fields requiring precise information, such as healthcare or legal services. The LLM is trying to give the right answer, but in matters of great importance, trying simply isn’t good enough!
Mitigation
You should constantly evaluate and fact-check the information provided by your LLMs. Automating this process is possible by using Machine Learning techniques or even an LLM prompt to do the fact-checking. In addition, human review, although expensive, is the most accurate way of verifying LLM outputs.
Another option is to incorporate mechanisms for the model to indicate uncertainty or lack of knowledge. For example, you can put in your prompt: “If you don’t know the answer, you can say ‘I don’t know’”.
(4) Not so good at math
Despite their advanced capabilities, LLMs often struggle with mathematical tasks, providing incorrect answers to even simple arithmetic problems. And that’s not just for pure math but also for anything really technical that involves numbers like physics and finance.
The limited ability of LLMs in math is because the models are primarily trained on text. Thus they lack a true understanding of numbers, logic, and mathematics as a whole.
Mitigation
Combining LLMs with specialized tools designed for mathematical tasks, such as computational software or calculators, can improve accuracy. Additionally, developing hybrid models that integrate LLMs with other AI systems specialized in math can offer better results.
(5) Prompt Hacking
LLMs can be manipulated or “hacked” by users to generate specific content, a phenomenon known as prompt hacking. Malicious users can craft prompts to do all kinds of bad things like:
- Bypass content filters to produce inappropriate or offensive outputs
- Get the LLM to do something malicious, like revealing or even deleting internal data
- Reveal the system prompt of the LLM
All of this is particularly problematic for public-facing applications, where the risk of misuse is high.
Mitigation
Implementing robust content filtering systems and continuously updating them to recognize and block harmful prompts is essential. Educating users about responsible usage and monitoring interactions can also help mitigate the risks associated with prompt hacking.
Conclusion
While LLMs are powerful and versatile tools, they come with a set of limitations that users need to be aware of. Inaccurate source citations, inherent biases, hallucinations, difficulties with mathematical tasks, and susceptibility to prompt hacking are challenges that must be addressed. By understanding these limitations and implementing appropriate mitigation strategies, we can use LLMs more effectively and responsibly.
Interesting mistakes!
Now, as promised, here are some examples of interesting LLM failures!
- Glue pizza and eat rocks: Google AI search errors go viral
- A startup tested if ChatGPT and other AI chatbots could understand SEC filings. They failed about 70% of the time and only succeeded if told exactly where to look
- B.C. lawyer reprimanded for citing fake cases invented by ChatGPT
- Fake AI-generated woman on tech conference agenda
- We tested ChatGPT in Bengali, Kurdish, and Tamil. It failed.