The Ascent of ChatGPT
Since its release on November 30, 2022 by OpenAI, the ChatGPT public demo has taken the world by storm. It is the latest in the research lab’s lineage of large language models using Generative Pre-trained Transformer (GPT) technology. Like its predecessors, ChatGPT generates text in a variety of styles, for a variety of purposes. Unlike previous iterations, however, it does so with greater skill, detail, and consistency.
Trained with 570 GB of data from books and all the written text on the internet, ChatGPT is an impressive example of the training that goes into the creation of conversational AI. An Associate Professor at Maryland has estimated that OpenAI spends $3 million per month to run ChatGPT.
ChatGPT is a next-generation language model (referred to as GPT-3.5) and is trained in a manner similar to OpenAI’s earlier InstructGPT, but on conversations. The model was fine-tuned to reduce false, harmful, or biased output using a combination of supervised learning in conjunction to what OpenAI calls Reinforcement Learning with Human Feedback (RLHF), where humans rank potential outputs and a reinforcement learning algorithm rewards the model for generating outputs like those that rank highly.
Additionally, the first open-source equivalent of ChatGPT has arrived with PaLM-rlhf-pytorch which is an implementation of RLHF on top of Google’s 540 billion parameters PaLM architecture.
ChatGPT is optimized for conversations, enables users to ask follow-ups, and challenges incorrect answers. The feel is groundbreaking, but it’s not that simple. While this technology is definitely entertaining, it’s not quite clear yet how it can effectively be applied to the needs of the typical enterprise.
Even the CEO of OpenAI provided a note of caution around the technology, “ChatGPT is incredibly limited, but good enough at some things to create a misleading impression of greatness. It’s a mistake to be relying on it for anything important right now. It’s a preview of progress; we have lots of work to do on robustness and truthfulness.”
Experimenting with ChatGPT feels like impressive progress from its predecessor GPT-3. Sometimes it says it can’t answer a question, which is a great step forward! But, like other large language models, it can be amusingly wrong. In the weeks since its release, members of the AI community have worked to test the limits of ChatGPT, unleashing a flood of tweets that made for often-great, and often-troubling entertainment.“It is exciting to see the launch of ChatGPT,” said Cathy Feng, Evalueserve’s AI Expert. “The interesting use cases it enables, allowing even those unfamiliar with the domain to get a better idea about what AI is capable of. This will help AI adoption in the long term. For the moment, ChatGPT could be good for individual use or entertainment but it’s important to proceed with caution when it comes to business use given the question of the accuracy of the answers it delivers.”
The generative AI space is seeing an accelerated level of funding activity. For example, Seek AI, a developer of AI-powered intelligent data solutions, announced it has raised $7.5 million in a combination of pre-seed and seed funding. Seek AI uses complex deep-learning foundation models with hundreds of billions of parameters. These models are the technology behind Open AI’s DALL-E and GPT-3, and are powerful enough to understand natural language commands and generate high-quality code to instantly query databases.
Additionally, Microsoft plans to invest $10 billion in OpenAI. Microsoft will reportedly get a 75% share of OpenAI’s profits until it makes back the money on its investment, after which the company would assume a 49% stake in OpenAI.
Large Language Models
Generative AI is based on so-called large language models (LLMs), a type of artificial intelligence (AI) that is trained to process and generate natural language. They are designed to understand and generate human-like language by learning from a large dataset of texts, such as books, articles, and websites.
One of the key features of large language models is their ability to generate human-like text. They can be used to generate news articles, stories, poems, and even code. They can also be used to improve machine translation, question answering, and language understanding tasks.
Some examples of large language models include GPT (Generative Pre-training Transformer), BERT (Bidirectional Encoder Representations from Transformers), and RoBERTa (Robustly Optimized BERT Approach). These models have been trained on a dataset of billions of words and have achieved state-of-the-art results on many natural language processing tasks.
In addition to their ability to generate human-like text, large language models also have the ability to learn and adapt to new tasks and languages. They can be fine-tuned on a smaller dataset to perform a specific task, such as language translation or summarization. This allows them to be applied to a wide range of language-based tasks and industries, including customer service, education, and journalism.
One of the main challenges in training large language models is the need for a large amount of high-quality training data. These models require billions of words of text in order to learn and generate realistic and coherent language. Another challenge is the computational resources required to train and fine-tune these models, which can be expensive and time-consuming.
Despite these challenges, large language models have the potential to revolutionize the way we interact with computers and machines. They can improve the accuracy and efficiency of natural language processing tasks, and have the potential to improve the way we communicate and interact with each other.
Researchers are developing techniques to make LLM training more efficient. DeepMind published recommendations for how to train LLMs given a fixed computational budget, leading to significant gains in efficiency. Although it addresses smaller models, cramming improves the performance that can be achieved with one day of training language models on a single GPU. As more teams develop and publish LLMs, there will be systematic comparisons that empower users to pick the right one based on cost, availability, and other criteria. For example, a team led by Percy Liang carried out an extensive study that compares LLMs.
Just in the short time since its release, ChatGPT has been used for a very diverse set of applications.
Researchers started adding ChatGPT as a co-author on their papers.
Musician Evan Greer recorded an album of Christmas songs where she wrote the music, but the lyrics were generated by Chat GPT. She used the prompts like “Write a Christmas song in the style of Blink-182.” A rather catchy song, “This Holiday’s a Hit” resulted along with three other tunes to comprise an EP.
“I kind of hate everything about this,” said Evan Greer (she/her), “I honestly don’t know why I made this. Partly to distract myself and cope with holiday depression, partly as a commentary on the way that artificial intelligence is invading the art and music world, and partly because I kind of love Christmas music, but also most Christmas music sort of sounds like it was written by an AI, since it’s so formulaic and commercialized.”
Customer Experience (CX)
Conversational AI as a CX tool is one such application.
“Though ChatGPT is an evolution of previous large language models (LLM), it really feels revolutionary because it’s uniquely optimized for conversations,” Yves Normandin, VP of AI Technologies at Waterfield Tech. “It is actually remembering information exchanged in previous interactions. While that’s completely mind-blowing, for businesses wondering how they can leverage ChatGPT for their own benefit, it’s not yet clear how this technology can be effectively used to help create virtual agents, as its current limitations would be unacceptable for use in customer service. For example, ChatGPT only knows the data it was trained on, so it cannot easily leverage external information like customer data. There is also little control over what responses the model will generate, and it will often produce, with great confidence, incorrect or nonsensical answers that appear totally plausible. Customer facing virtual agents, on the other hand, must be trained to not only optimize conversation, but to enhance business performance and elevate customer experience, by leveraging a wide range of data. And the information they provide cannot be misleading or inaccurate. While ChatGPT is undeniably interesting, there is still much to be learned before we can expect to see it practically applied in business without human oversight.”
Software Product Design
At the direction of big data vendor Datastax, ChatGPT was asked to write a business plan for the perfect database, and proceeded to create a detailed description of an ideal database and pretty much mapped directly to DataStax’s Astra DB, built on Apache Cassandra. See below for what ChatGPT came up with. By offering these suggested key features, a massively scalable database could be used to power popular internet sites and meet the demanding requirements of today’s digital economy.
A massively scalable database that can be used to power popular internet sites would need to have several key features to be successful. Here is a possible business plan for such a database:
- The database would need to be highly distributed and horizontally scalable, meaning it can easily handle large amounts of data and traffic by adding more nodes to the network. This would allow it to support the large and unpredictable workloads of popular internet sites.
- The database would need to offer strong performance and low latency, so that users can access and manipulate data quickly and efficiently. This could be achieved through the use of in-memory caching and advanced indexing techniques.
- The database would need to be highly available and resilient, with features like automatic failover and data replication to ensure that the system remains up and running even in the face of hardware or software failures.
- The database would need to offer a flexible and expressive data model, allowing developers to easily store and query complex data structures. This could be achieved through the use of a NoSQL data model, such as document or key-value stores.
- The database would need to have a robust and easy-to-use query language, so that developers can quickly and easily access and manipulate data. This could be achieved through the use of a powerful and intuitive query language like SQL or a more flexible and expressive language like GraphQL.
- The database would need to have strong security features, such as encryption at rest and in transit, to protect sensitive data and ensure compliance with privacy regulations.
- The database would need to be easy to manage and operate, with intuitive tools and interfaces for monitoring, tuning, and troubleshooting the system. This could be achieved through the use of a user-friendly management console and robust APIs for automating common tasks.
ChatGPT is able to code like a pro, with some amazing examples floating around but similar to pros, sometimes the generated code does not work. Here’s an example I tried using the prompt: “write R code to predict the Species variable from the iris dataset.” I found it interesting that ChatGPT decided to use the randomForest algorithm. It appropriately split the dataset into training and test, and use a 70% split. Then it used a confusion matrix to assess the performance of the classification task. It even added some comments to the code. I was impressed with its thoroughness.
Generative AI and LLMs are front and center in the industry hype cycle right now for their innovation, unique applications, and potential impact on the business world. However, like any new technology, companies should refrain from rushing to adopt it without considering true needs and possible value.
Companies should not embrace new technology for technology’s sake. When contemplating the use of generative AI, businesses need to consider their target market to ascertain the justification for embracing the technology, and define concrete experiments coupled with specific hypotheses in order to fully understand where the value lies.
Furthermore, in the short time that the ChatGPT demo has been available for evaluation, we’re already seeing a plethora of caveats.
ChatGPT is being hyped as a “Google-killer” thanks to NLP methods that deliver rapid and straightforward answers to text prompts while also writing music lyrics, school papers, poetry, and even programming code. At the same time, there is much concern about its limitations, biases, and potential for abuse. We’re already seeing the pitfalls of large-scale public access to tools like ChatGPT and how generative AI can be exploited by writing malware, generating phishing schemes, presenting disinformation as fact. We’re seeing it’s important to safeguard data models to ensure ethical AI training protocols. Further, we’re starting to get a glimpse of what future iterations of OpenAI’s GPT will look like and how the technology may impact industries embracing AI adoption.
“Innovative technology like ChatGPT can support the reshaping of various functions and industries from customer support and services to whiteboarding go-to-market strategies,” said Swapnil Srivastava, Evalueserve’s VP and Global Head of Data and Analytics. “While technology engineers are finding ChatGPT useful to rapidly debug and or create codes, creative writers are testing it to create complex movie plots. Such a substantial extent of value can result in potential misuse and misinformation. This technology cannot source information from the web and provide insights about organizations and processes/policies effectively. However, based on the roadmap that ChatGPT is heading towards, it is probable that such developments would unfold sooner than later that might rival some of the value-proposition of search engine platforms like Google. Leaders in the data and analytics space can start by determining the potential benefits of such NLP technologies and how they can help them achieve their desired goals and thereby establish processes and capability to integrate and develop such solutions within their AI foundational roadmap. AI solutions like ChatGPT present a huge step in the accelerated journey towards smarter and stronger AI with a unique value proposition. However, with the potential risks, ethical concerns, and limitations that such AI has, human intervention would always be required to augment AI and navigate toward AI-driven decision-making.”
Plagiarism Gone Wild
A big concern with ChatGPT is for users to generate complete papers, essays, or articles and use them as their own. Of course, ChatGPT is trained on other people’s words, so this is a form of plagiarism on a massive scale. We’re most likely to see the emergence of “AIgiarism,” as mentioned by American venture capitalist Paul Graham. He believes that the rules against “AIgiarism” should be like those against plagiarism.
As a result, we’re starting to see professional societies, schools, and social media sites react to the potential of ChatGPT and other LLMs to produce falsehoods, socially biased information, and other undesirable output in the guise of reasonable-sounding text.
New York City blocked access to ChatGPT in the city’s 1,851 public schools, which serve over one million students. Officials expressed concern that the tool enables plagiarism and generates falsehoods. Closer to home, the organizers of the upcoming International Conference on Machine Learning (ICML) in Honolulu prohibited paper submissions that include text generated by LLMs, including ChatGPT, unless the text is included for analytical purposes. Additionally, the important resource for data scientists, the website Stack Overflow temporarily banned ChatGPT-generated content due to the model’s propensity for outputting incorrect answers to technical questions, coupled with the realization that moderating the volume of misleading information submitted since the ChatGPT demo was released had become unmanageable.
Misinformation and Assertive Writing Style
Another concern related to LLMs is that they often confidently make assertions that are blatantly false. This raises fears that they will flood the world with misinformation. If the degree of confidence could be moderated appropriately, the generated text would be less likely to mislead.
LLMs learn an authoritative writing style that’s often found in internet content. Unfortunately, LLMs can use this style even when they get the facts entirely wrong. Humans are not right all the time, but like what we see with many politicians, we don’t expect them to be simultaneously confident and wrong. Human experts typically speak in a range of styles: confidence when we know what we’re talking about, but at the same time explaining the boundaries of our knowledge. For instance, when explaining how to build an AI application, a human expert might propose one approach but also describe the range of algorithms one might consider. Knowing what you know and don’t know is a constructive quality of expertise.
Researchers are working to build systems that can express different degrees of confidence. A model like Meta’s Atlas or DeepMind’s RETRO, for instance, synthesizes multiple articles into one answer. This strategy might infer a degree of confidence based on the reputations of the sources it draws from and then alter its communication style accordingly.
Allowing generative algorithms to express doubt when they’re not sure they’re right, will go a long way toward building trust and minimizing the risk of generating misinformation.
LLMs for Science
ChatGPT arrived one week after Meta withdrew Galactica, a special-purpose LLM designed to generate scientific papers. Galactica was offered as an aid to researchers aiming to publish research results, but users of the public demo prompted it to generate nonsensical topics like “scientific paper” on the benefits of eating crushed glass.
Safeguards that OpenAI put in place to block undesirable outputs proved easy to break. Prompted for how to break into someone’s house, the model refused to answer, but using a portion of a story in which a character asked the same question, ChatGPT provided an instruction manual for burglary.
ChatGPT also expresses the same social biases that have plagued similar models. Prompted to write a Python function to evaluate the quality of scientists based on a JSON description of their race and gender, it generated code that favored only white, male scientists.
“Is ChatGPT mature enough for usage to solve real business problems? As it stands today, ChatGPT has further mainstreamed Generative AI by making it more accessible,” observed Jaya Kishore Reddy Gollareddy, CTO & Co-founder, Yellow.ai. “However, for enterprises to connect with their end users and drive business impact, a more comprehensive end-to-end conversational AI solution is required. It is important to note that the success of conversational AI solutions depends on their ability to deliver a high-quality user experience. This includes being able to understand and respond accurately to the user’s input or perform a relevant action, and that too in a natural, human-like manner. Businesses need to have control over the conversational flow. There must be control and understanding of the various conversational flows, intents, and utterances within each use case, which varies by business and industry. Conversational AI solutions also need to support integration with backend systems such as payment gateways, CRMs, and Contact Centre Platforms to pull and push relevant information in order to provide high levels of automation and subsequently greater ROI. On the other hand, ChatGPT can as of now only fetch information and respond to the user’s prompts based on the knowledge fed to it during its training, but it lacks the ability to perform a relevant action or integrate with backend systems. For it to perform an action like fetching policy details or booking a flight, it needs access to third-party systems.Not only that, each business is highly distinct in nature; they have their own domain knowledge and sources that are very specific to their products and services and to the industry they operate in. In order for them to leverage ChatGPT, they would need to access the API to fine-tune ChatGPT with their own data and create their own variants of ChatGPT.”
Microsoft is discussing plans to use OpenAI’s ChatGPT to augment its suite of Office tools and products (Word, PowerPoint, etc.). It’s worth bearing in mind that these LLMs will not always deliver value for technical or niche users and businesses. LLM-backed results are often inaccurate, especially in specific technical domains, and only larger corporations like Microsoft have the resources to overcome these issues – so not all businesses should follow in their footsteps.
AI researcher, Victor Botev, CTO at Iris.ai, comments on how a more focused and tailored approach to NLP can lead to greater value for businesses:
“LLMs represent endless possibilities, but their turn in the spotlight exposes unsolved questions: factual accuracy, knowledge validation, and faithfulness to an underlying message. To solve these issues for niche domains and business applications requires substantial investment. Otherwise, these models are unusable.”
“Every organization operating in a niche domain needs accurate results that understand the specificity and idiosyncrasies of said domain. LLMs are not, and will not be, able to capture these nuances within the next couple of years. Their immense running costs and reliance on volumes of data that may not even exist in certain fields is compounded by the lack of fact-checking software advanced enough to measure their quality, let alone begin to fix them. By improving fact-checking, we can unlock better training, better specialization, and domain adaptation for these models, as well as driving down costs and making them more accessible. However, this takes time.”
“Instead of getting caught up in the generative AI craze that will dominate 2023, businesses and large tech corporations should consider the AI technologies that will drive real value, rather than driving headlines. Bigger does not always equal better.”
Moving Forward with LLMs
Finally, the latest warning about ChatGPT comes from OpenAI itself. Two of its policy researchers were among the six authors of a new report that investigates the thread of AI-enable influence operations.
“Our bottom-line judgment is that language models will be useful for propagandists and will likely transform online influence operations. Even if the most advanced models are kept private or controlled through application programming interface (API) access, propagandists will likely gravitate towards open-source alternatives and nation states may invest in the technology themselves,” according to a blog accompanying the report.
As new as this technology is, work is already underway to detect ChatGPT-generated text. For example, Princeton University student Edward Tian introduced GPTZero and how it’s able to successfully distinguish writing by a human versus AI.
Originality.AI recently launched a tool that allows users to screen for content created by popular AI tools, such as ChatGPT. The tool is able to identify the most advanced AI text-generated models in the market; GPT-2, GPT-NEO, GPT-J, GPT-3, GPT-3.5 and ChatGPT.
“As AI-generated content grows more rampant across the internet, it becomes less clear if a human or AI has created the work,” said Jonathan Gillham, Founder of Originality.AI. “With this in mind, I have enabled Originality.AI to be used at the click of a button. AI writing tools can be beneficial, but it’s important for web publishers to be aware if the content they are publishing is authentic.”
Got It AI, the Autonomous Conversational AI company, announced an innovative new “Truth Checker” AI that can identify when ChatGPT is hallucinating (generating fabricated answers) when answering user questions over a large set of articles or knowledge base. This innovation makes it possible to deploy ChatGPT-like experiences without the risk of providing incorrect responses to users. Enterprises can now confidently deploy generative conversational AIs that leverage large-scale knowledge bases such as those used for external customer support or for internal user support queries. The “Truth Checker” AI, uses a separate, advanced LLM-based AI system and a target domain of content (e.g. a large knowledge base or a collection of articles) to train itself autonomously for one task: truth checking.
“We tested our technology with a dataset of 1000+ articles across multiple knowledge bases using multi-turn conversations with complex linguistic structures such as co-reference, context, and topic switches,” said Chandra Khatri, former Alexa Prize team leader and co-founder of Got It AI. “ChatGPT produced incorrect responses for about 20% of the queries when given all the relevant content for the query in its prompt space. Our Truth Checker AI was able to detect 90% of the inaccurate responses, without human help. We will also provide the customer with a simple user interface to the Truth Checking AI, to further optimize it, identify the remaining inaccuracies and eliminate virtually all inaccurate responses.”
Lastly, OpenAI has plans to embed cryptographic tags into ChatGPT’s output in order to watermark the generated text. OpenAI told TechCrunch it is working on mitigations to help spot ChatGPT-generated text. But users may find ways to bypass safeguards. For instance, OpenAI’s watermarking proposal can be defeated by lightly rewording the text, MIT computer science professor Srini Devadas told TechCrunch. The result could be an ongoing Whac-A-Mole battle between users and researchers.
Natural language is the simplest and most convenient way for humans to communicate. Software that’s able to comprehend what they’re told and respond with meaningful information will open a broad range of everyday functions. The current industry hype cycle holds that ChatGPT may be the breakthrough tool for making this goal a reality. But researchers face the steep challenge of building a language model that doesn’t make up facts and ignore limits on its output.
We’re used to seeing overhyped technology. Reinforcement learning is a good example after solving Atari games. But LLMs seem different, and likely to find a place in significant applications. Meanwhile, many details remain to be worked out and the AI community must strive to minimize potential harm.
Many observers worry that generative text will disrupt society. The OpenAI CEO Sam Altman tweeted that the model was currently unsuitable for real-world tasks due to its deficiencies in truth-telling. Bans discussed in this article are an understandable reaction by authorities who feel threatened by the progressively sophisticated abilities of LLMs.
There was a time that math teachers protested the presence of calculators in the classroom. Since then, they’ve learned to integrate these tools into their lessons. It’s important that authorities take a similarly forward-looking attitude to assistance from AI.
NOTE: A section of this article was written by ChatGPT! Can you guess which one?
Editor’s note: If you’re looking to learn more about the generative AI field and generative AI projects, then check out ODSC East 2023. We’re currently working on an entire track devoted to generative AI, so subscribe to our newsletter and be the first to hear the details. In the meantime, register for ODSC East now while tickets are still 30% off.