Beware! You Might Unknowingly Harm Your Company if You Use Github AI Copilot

You have been warned!

Vikranth Kanumuru
Kanlanc

--

Photo by Road Trip with Raj on Unsplash

One fine Saturday morning, I wake up with this email.

Image by Author

Now unless you have been living under a rock, you must have come across the hype surrounding Github’s latest AI advancement.

I was no exception so extremely excited I rush to my home workspace and impatiently clicked the invitation probably a million plus one times. Did I mention I was very excited?

But it was then I faced my first roadblock to accessing Copilot, the software I almost dreamed about in my sleep, which was enabling two-factor authentication.

Now, I know my password is strong enough to occupy 5 years of the hacker’s RAM should anyone be stupid enough to attempt to crack it, but seeing that there is no other way to access the project, I had to enable the two-factor authentication albeit reluctantly.

The hype and allure of this product were almost tangible across the web so this was but a minor bump in the journey.

Once I set up everything and actually started hacking away to see Copilot's Power, in all honesty, the only thing going on in my head was

“Damn! Damnn!! Damnnn!!!”

Here‘s an example:

It was quite a coincidence that it was earlier this month, I wrote another article about how AI can be used for freelance writing and blogging, and now, I am looking at an AI software that can possibly replace software engineers.

AI is truly taking over the world lol.

“Alright, Alright, But What the H*ck is Copilot?” You might ask

“And how exactly are you gonna justify that clickbait title?” might be your follow-up question.

Let me answer both of them in this section

Starting off, Copilot is a new AI tool that can write user-compatible code and is much better at the task than its predecessor — GPT-3. The preview release is built on OpenAI Codex, an AI system acquired by Microsoft from OpenAI. It’s a powerful source code analyzer compatible with dozens of popular programming languages.

As shown in the above tweet, it autocompletes code snippets, suggests new lines of code, and can even write whole functions based on the description provided. According to the GitHub blog, the tool is not just a language-generating algorithm based on user input — it is a virtual AI pair programmer.

Distributed as a VSCode extension, Copilot is like a vastly more powerful autocomplete that can fill in whole sections of code. It looks at what you’re writing and suggests new lines or entire self-contained functions.

After the initial honeymoon period with the new toy(software), being an engineer at heart, I asked the Copilot team in Github if I can contribute.

The answer I got disappointingly said I cannot with code but through usage examples.

Although my efforts for code contribution were in vain, I wanted to understand how this exquisite piece of software was built.

Image by Author

Now, this is where the story gets very interesting,

Questionable Training Data Sources

After doing a bit of research, I came across articles that say, in prettier terms, that Copilot is trained on “Borrowed” code from Github Repositories, but in cruder terms, rephase it as Copilot has been trained on “Copyright infringement”-able code.

Copilot has been trained using public GitHub projects with a wide variety of licenses. According to GitHub, this represents fair use of those projects. What’s less clear is your responsibilities should you accept a Copilot suggestion.

The question comes when you ask what happens when the tool reproduces these code snippets? Is it legal? Can father organizations like Microsoft, OpenAI, and GitHub make money from this tool even if it is trained on free and open-source code?

Now even if we ignore the morality of using open-source code to monetize like this, you might wonder how exactly is it “Copyright infringement” if you use open source code because by definition open source means anyone can copy and use it right?

This is where the subtle definitions of licenses come into play. The different types of common open-source licenses are MIT, Apache, and GPL(General Public License). MIT and Apache let you have free reign over code under their licenses with none or a few constraints like attribution.

But GPL and other similar licenses stipulate derivative works must include the same permissions, so incorporating GPL code into a commercial product is a licensing breach.

Consequently, the use of Copilot has serious legal ramifications attached which you should evaluate before installing it. As Copilot does seem to emit code verbatim, without indicating the license accompanying the snippet, you could unknowingly commit copyright infringement by accepting a suggestion.

Your API Keys and App Secrets might be at Risk

Then there is a chance that it might leak your API keys and other secrets after gathering enough data from your time with it.

Here’s where the trouble begins. Copilot still stands a chance of outputting code sections verbatim. Depending on the licenses surrounding those snippets, this could get your own project into hot water. As Copilot’s been trained on GitHub projects, you might even find personal data is injected into your source files.

Predictive models [xkcd]

In the Near Future, You might Pay for What You Helped Create

This man explained it the best:

Conclusion

Copilot is very appealing to developers because it aims to take on one of the biggest headaches developers face which is writing boilerplate code, which in many cases can tend to be not very specific with the project at hand.

But just like the saying,

Drinking poison to satisfy thirst

Using Copilot as it is now, will only lead to more headaches and problems in the coming future.

If GitHub is able to somehow overcome these obstacles and develop Copilot further, I have no doubts saying it will be one the most widely used tool in a developer’s toolbox.

Let me know your thoughts.

Want more from me?

  1. Subscribe to get notified when I publish articles.
  2. Support my writing by becoming a referred Medium member.
  3. Connect and reach me on Linkedin and Twitter

--

--

Vikranth Kanumuru
Kanlanc

A Curious Fellow in love with Technology — Featured in ABC Australia| 70K+ Views | 9 x Top Writer in Innovation and Startup — https://portfolio.kanlanc.com