How Good Is ChatGPT at Coding, Really? (2024)

This article is part of our exclusive IEEE Journal Watch series in partnership with IEEE Xplore.

Programmers have spent decades writing code for AI models, and now, in a full circle moment, AI is being used to write code. But how does an AI code generator compare to a human programmer?

A study published in the June issue of IEEE Transactions on Software Engineering evaluated the code produced by OpenAI’s ChatGPT in terms of functionality, complexity and security. The results show that ChatGPT has an extremely broad range of success when it comes to producing functional code—with a success rate ranging from anywhere as poor as 0.66 percent and as good as 89 percent—depending on the difficulty of the task, the programming language, and a number of other factors.

While in some cases the AI generator could produce better code than humans, the analysis also reveals some security concerns with AI-generated code.

Yutian Tang is a lecturer at the University of Glasgow who was involved in the study. He notes that AI-based code generation could provide some advantages in terms of enhancing productivity and automating software development tasks—but it’s important to understand the strengths and limitations of these models.

“By conducting a comprehensive analysis, we can uncover potential issues and limitations that arise in the ChatGPT-based code generation... [and] improve generation techniques,” Tang explains.

To explore these limitations in more detail, his team sought to test GPT-3.5’s ability to address 728 coding problems from the LeetCode testing platform in five programming languages: C, C++, Java, JavaScript, and Python.

“A reasonable hypothesis for why ChatGPT can do better with algorithm problems before 2021 is that these problems are frequently seen in the training dataset.” —Yutian Tang, University of Glasgow

Overall, ChatGPT was fairly good at solving problems in the different coding languages—but especially when attempting to solve coding problems that existed on LeetCode before 2021. For instance, it was able to produce functional code for easy, medium, and hard problems with success rates of about 89, 71, and 40 percent, respectively.

“However, when it comes to the algorithm problems after 2021, ChatGPT’s ability to generate functionally correct code is affected. It sometimes fails to understand the meaning of questions, even for easy level problems,” Tang notes.

For example, ChatGPT’s ability to produce functional code for “easy” coding problems dropped from 89 percent to 52 percent after 2021. And its ability to generate functional code for “hard” problems dropped from 40 percent to 0.66 percent after this time as well.

“A reasonable hypothesis for why ChatGPT can do better with algorithm problems before 2021 is that these problems are frequently seen in the training dataset,” Tang says.

Essentially, as coding evolves, ChatGPT has not been exposed yet to new problems and solutions. It lacks the critical thinking skills of a human and can only address problems it has previously encountered. This could explain why it is so much better at addressing older coding problems than newer ones.

“ChatGPT may generate incorrect code because it does not understand the meaning of algorithm problems.” —Yutian Tang, University of Glasgow

Interestingly, ChatGPT is able to generate code with smaller runtime and memory overheads than at least 50 percent of human solutions to the same LeetCode problems.

The researchers also explored the ability of ChatGPT to fix its own coding errors after receiving feedback from LeetCode. They randomly selected 50 coding scenarios where ChatGPT initially generated incorrect coding, either because it didn’t understand the content or problem at hand.

While ChatGPT was good at fixing compiling errors, it generally was not good at correcting its own mistakes.

“ChatGPT may generate incorrect code because it does not understand the meaning of algorithm problems, thus, this simple error feedback information is not enough,” Tang explains.

The researchers also found that ChatGPT-generated code did have a fair amount of vulnerabilities, such as a missing null test, but many of these were easily fixable. Their results also show that generated code in C was the most complex, followed by C++ and Python, which has a similar complexity to the human-written code.

Tangs says, based on these results, it’s important that developers using ChatGPT provide additional information to help ChatGPT better understand problems or avoid vulnerabilities.

“For example, when encountering more complex programming problems, developers can provide relevant knowledge as much as possible, and tell ChatGPT in the prompt which potential vulnerabilities to be aware of,” Tang says.

From Your Site Articles

  • What to Do When the Ghost in the Machine Is You ›
  • How Coders Can Survive—and Thrive—in a ChatGPT World ›

Related Articles Around the Web

How Good Is ChatGPT at Coding, Really? (2024)

FAQs

How Good Is ChatGPT at Coding, Really? ›

Overall, ChatGPT was fairly good at solving problems in the different coding languages — but especially when attempting to solve coding problems that existed on LeetCode before 2021.

Is ChatGPT actually good at coding? ›

The results show that ChatGPT has an extremely broad range of success when it comes to producing functional code—with a success rate ranging from anywhere as poor as 0.66 percent and as good as 89 percent—depending on the difficulty of the task, the programming language, and a number of other factors.

How accurate is ChatGPT for coding? ›

As the diagram below shows, ChatGPT 3.5 correctly identified the problem 46.2 percent of the time. More than half of the time, 52.1 percent, ChatGPT 3.5 did not identify the coding error at all.

Is ChatGPT 4 worth for coding? ›

It's unusable at the moment. Code blocks being messed up. Each time when you ask something, it gives a complete unneeded summary everytime instead of going to the point of what was asked. I've been using the service for almost a year now and GPT-4 has just become worse instead of better since launching…

Is ChatGPT really that powerful? ›

The model is updated based on how well its prediction matches the actual output. Through this process, the transformer learns to understand the context and relationships between words in a sequence, making it a powerful tool for natural language processing tasks such as language translation and text generation.

Is ChatGPT replacing coders? ›

New technologies have long promised to make human software engineers redundant. But developers have only gotten more important over time.

Can ChatGPT 4 write good code? ›

So I had very low expectations for this new version. But I have to be honest : ChatGPT 4 got REALLY good at coding . I've been using it for a while for pair-programming (debugging, refactoring or juste writing specific methods).

Can professors tell if you use ChatGPT to write code? ›

Is ChatGPT detectable? The short answer is yes. Professors can detect conventional content generated by ChatGPT with a likelihood of 74%.

Can you detect code made by ChatGPT? ›

The ChatGPT Code Detector is designed to analyze and detect if a given piece of code was generated by ChatGPT or any other AI model. It provides insights based on coding style, structure, and syntax that are indicative of AI-generated code.

Can ChatGPT solve any coding problem? ›

ChatGPT, the AI language model, can assist in breaking down complex coding problems and finding efficient solutions.

Is there an AI better than ChatGPT for coding? ›

GitHub's Copilot: Best ChatGPT Alternative for Coding Assistance. GitHub Copilot is an AI-powered coding assistant that developers will love. While ChatGPT is a powerful general-purpose language model, GitHub Copilot was created specifically for developers.

What is the best AI for coding? ›

Sourcegraph Cody. Cody is an AI coding assistant designed to enhance the speed and comprehension of software development. With its deep understanding of your codebase, it provides excellent AI-assisted autocomplete capabilities. Its intelligent code suggestions complete not just lines of code but entire functions.

Can ChatGPT-4 be detected by Turnitin? ›

Yes, Turnitin can detect content generated by Chat GPT. ChatGPT, developed by OpenAI, is one of the leading conversational AI models. It has become popular for its ability to generate human-like text.

Can you ask ChatGPT if it wrote something? ›

Can I ask ChatGPT if it wrote something? ChatGPT has no “knowledge” of what content could be AI-generated or what it generated. It will sometimes make up responses to questions like “did you write this [essay]?” or “could this have been written by AI?” These responses are random and have no basis in fact.

What is ChatGPT not good at? ›

While having many opinions over the most controversial topics such as socialism vs capitalism and the ethical issues over philosophy, ChatGPT may fail at arithmetic calculations even basic math. The inability of doing perfect math calculations of ChatGPT doesn't mean that it has no intelligence.

Is ChatGPT getting smarter? ›

However, with the latest GPT-4 Turbo model, ChatGPT is getting smarter and to the point. OpenAI also said that ChatGPT now comes with “advanced capabilities” in writing, math, logical reasoning and coding.

Can I learn coding on ChatGPT? ›

You can ask for suggestions on specific topics or programming languages that interest you. Practice coding: The best way to learn coding is by practicing it. You can ask Chat-GPT for coding challenges or practice problems to work on. You can also share your code with Chat-GPT and ask for feedback on how to improve it.

Can you get caught coding with ChatGPT? ›

Unfortunately, yes. ChatGPT uses language technology models that are easily detectable by anti-plagiarism software like originality.ai, contentscale.ai, or gptzero.me. That being said, there are ways to use AI technology to assist you with your work without getting caught.

Can ChatGPT run code? ›

It allows you to run Python code and upload files, significantly increasing the scope of tasks ChatGPT can perform.

Can teachers tell when you use ChatGPT for coding? ›

Is ChatGPT detectable? The short answer is yes. Professors can detect conventional content generated by ChatGPT with a likelihood of 74%. A recent study called 'Testing of detection tools for AI-generated text' looked at over 12 publicly available tools and two commercial systems (Turnitin and PlagiarismCheck).

Top Articles
Latest Posts
Article information

Author: Tuan Roob DDS

Last Updated:

Views: 6079

Rating: 4.1 / 5 (42 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Tuan Roob DDS

Birthday: 1999-11-20

Address: Suite 592 642 Pfannerstill Island, South Keila, LA 74970-3076

Phone: +9617721773649

Job: Marketing Producer

Hobby: Skydiving, Flag Football, Knitting, Running, Lego building, Hunting, Juggling

Introduction: My name is Tuan Roob DDS, I am a friendly, good, energetic, faithful, fantastic, gentle, enchanting person who loves writing and wants to share my knowledge and understanding with you.