The wildly popular AI chatbot trained by OpenAI based on 300 billion online words is a privacy nightmare.
ChatGPT has taken the world by storm. In just two months since its release, it has garnered 100 million enthusiastic users, making it the fastest-growing consumer application ever launched. Users are drawn to its advanced capabilities and concerns about its potential to replace human roles in various fields.
However, there's a less-discussed aspect of this AI chatbot. It's the privacy risks that ChatGPT may pose to each of us. Just yesterday, Google unveiled its own conversational AI named Bard, and others will soon follow suit. Tech companies working on AI have truly entered an arms race.
The issue is all led by our personal data, yours and mine.
300 billion words. How many are yours?

ChatGPT is built and strengthened by a large language model that demands a massive amount of data to operate and enhance its capabilities. The more it's trained on diverse data, the better it detects patterns, predicts what comes next, and generates more coherent text.
OpenAI, the company behind ChatGPT, has provided their tool with about 300 billion words. These are systematically collected from the internet, books, articles, websites, and posts. And of course, they include personal information collected without consent.
If you've ever written a blog post, a product review, or a couple of comments on an online article, chances are this information might have been used by ChatGPT.
So why is that a problem?
The data collection used to train ChatGPT is problematic for several reasons.
Firstly, none of us were asked whether OpenAI could use our data. This is clearly a privacy violation, especially when the data is sensitive and could be used to identify us, our family members, or our locations.
Even when they are publicly available data, using them can violate what we often refer to as the 'integrity of text.' This is a fundamental principle in legal discussions about privacy. It requires that individuals' information should not be disclosed outside the context in which it was originally created. Simply put, you would never want someone to splice your words or passages when quoting them, regardless of the scenario.
Furthermore, OpenAI doesn't provide any process for individuals to check whether the company is storing their privacy information or request its deletion. Remember, this right is ensured by the General Data Protection Regulation (GDPR) of the European Union as well as some related regulations.
'The Right to be Forgotten' follows. It is the right to have personal information deleted, edited, or linked information restricted if it harms the individual or community interest, or if it is outdated and unnecessary. Noticeably, ChatGPT often provides inaccurate or misleading information to the community.

Additionally, data obtained for training ChatGPT can be exclusive or copyrighted assets. For example, when requested, this tool has generated paragraphs in copyrighted books or novels. ChatGPT doesn't consider copyright protection when producing output. This means anyone using the output elsewhere, such as in their thesis or work, may inadvertently fall into a 'plagiarism' situation.
Lastly, OpenAI has not compensated for the data it collected from the internet. Individuals, website owners, and content-producing companies are not reimbursed. This is especially noteworthy as OpenAI was recently valued at 29 billion USD, more than doubling its value in 2021.
OpenAI has also announced ChatGPT Plus, a subscription-based plan that will provide customers continuous access to the tool, with faster response times and priority access to new features. This plan is expected to contribute to revenue of 1 billion USD by 2024.
Remember that none of the figures mentioned above can exist without data - our data - something collected and used without permission.
ChatGPT's Fragile Privacy Policy
Another privacy risk relates to the data provided to ChatGPT in the form of prompts or user questions. When we ask the tool to answer questions or perform tasks, we may inadvertently transfer sensitive information, and the tool will receive and incorporate that information into its public data scope.
For instance, a lawyer might request the tool to review a divorce agreement draft, or a programmer might ask the tool to check their code. The divorce agreement and the code snippet would then become part of ChatGPT's database. This means they could be used to further train the tool and be incorporated into responses for another user's request.

In addition to this, OpenAI also collects various types of user information. According to the company's privacy policy, it collects IP addresses, user browser types and settings, as well as data about user interactions with the website - including the type of content users interact with, features they use, and actions they perform.
It also collects information about users' web browsing activities over time and across websites. Alarmingly, OpenAI states that they may share users' personal information with unspecified third parties without notifying them to meet business objectives.
Is it time to regulate ChatGPT?
Some experts believe ChatGPT is the tipping point for AI, a milestone in realizing technological advancements that could revolutionize how we work, learn, write, and even think. But despite its potential benefits, we must remember that OpenAI is a private company. They operate for profit, and their commercial rules may not necessarily align with societal needs.
The privacy risks associated with ChatGPT should serve as a warning. As consumers increasingly use AI technology, we should be extremely cautious about the information we share with such tools.
Source: Gizmodo
