Be sure to follow us on LinkedIn!
At this point, ChatGPT has firmly established itself into the cultural zeitgeist. People are using it to perform tasks like writing research papers, completing math equations, and crafting code. It is at least a part (if not a frontrunner) of a recent wave of AI generators that can make whatever your heart desires when you input phrases and parameters into the prompt box provided by the app.
But all of that may be put on pause as a research paper out of Stanford University and UC Berkeley is claiming that ChatGPT is degrading in accuracy. The authors of the study found that users of ChatGPT-4 and ChatGPT-3.5 (the most recent versions of the app) found a huge difference in the chatbot’s answers between March and June of this year, with responses to prompts often being incorrect or incredibly vague.
Some common use issues noted in the report include:
- GPT-4 went from around 98% accuracy in identifying prime numbers to only 2% during the March-June timeframe.
- When asked to create computer code, GPT-4 generated functional work 52% of the time in March and only 10% of the time in June. The lines of code it provided also included “junk” text that didn’t serve any purpose in the code and could be an issue if included into a larger bit of software.
- The newer version of the large language model (LLM) got better at demurring from prompts that had questionable legality, but would give only vague answers like “Sorry, but I can’t assist with that” rather than explain why the prompt was problematic.
|
Source: “How Is ChatGPT’s Behavior Changing over Time?” (Chen, Zaharia, Zou) |
We’ve known for a while now that the AI image generators have had problems with the pictures they produce, but this is the first dramatic shift in ability for an app like ChatGPT that isn’t due to user error. (We all know that tools like ChatGPT are only as good as the prompts provided.) Part of this situation could be tied to the increase in AI tools starting to hallucinate—that is, offer irrelevant, nonsensical, or factually incorrect answers because the AI generator is using data sources that include misinformation.
And before you write off any concerns about ChatGPT because these findings are only showing up in a single report, it’s worth noting that anecdotal evidence is also supporting the Stanford/Berkeley study. In a reddit thread related to ChatGPT, users from all over have noted that the bot would routinely ignore prompts provided and produce more hallucinated text. Others also said that the system would fail at relatively simple problem-solving tasks, whether it was math or coding questions.
While providing clean data sources could help reduce hallucinations and restore AI’s usefulness, another part of this issue could be due to the lack of regulation regarding AI. In a recent US Senate judiciary subcommittee, Sam Altman (the CEO of OpenAI, the company behind ChatGPT) suggested that there needs to be some official body creating licensing and testing requirements for the development and release of AI tools. He also wants this council to establish safety standards and bring in independent auditors to assess the models before they are released. The reason for all this oversight is not because of some sci-fi dystopian nightmare where AI gains sentience and takes control. Rather, our greater issues appear to be weakening labor rights by trying to get AI to replace a skilled workforce and creating misinformation campaigns among the masses due to people believing the information provided by AI without being able to confirm if it’s true.
Keypoint Intelligence Opinion
Like any tool, ChatGPT and AI generators are going to have their strengths and weaknesses. For all the good it does in providing information and data for tasks that may be too complicated for us to do on our own, it is just as fallible as any computer, calculator, or burgeoning writer/artist.
Despite the countless examples of evil AI in our media, the danger we face isn’t that ChatGPT will gain control of the world’s banks and plunge us into some economic hellscape. It’s that we become overly reliant on the content it creates—trusting the information it provides implicitly without checking other sources to see if it’s telling the truth or making sure that the code provided doesn’t have anything included that we don’t need or want. AI generators can also stunt our own growth if we let it. Sure, it can write your term paper, but what good is it if you don’t learn the content yourself?
Ultimately, this could be a wake-up call for many who have drunk the “AI is the future” Kool Aid and are blind to the fact that generative AI tools like ChatGPT are better served as a supplement to your own skills rather than a replacement. And now that the accuracy of the data is declining sharply, it might be better to pull back on a reliance on AI…or you might get a failing grade for turning in a report on how ABBA fought the British and Prussians at the Battle of Waterloo.
Log in to the InfoCenter to view research on generative AI and smart technology through our Office CompleteView Advisory Service. If you’re not a subscriber, contact us for more info by clicking here.
Keep Reading
ChatGPT Produces Yet Another Blog
Managed Service Providers Are Integrating Generative AI
Robowriters of the World Unite and Take Over