OpenAI, the AI research and deployment company, has introduced a new version of GPT-3, its autoregressive language model. The updated model called InstructGPT is better at following the instructions of people using it. The San Francisco-based lab says that InstructGPT does away with some of the most toxic issues reported with its predecessor.
The AI lab says that it is using alignment to ensure that InstructGPT produces less offensive language or misinformation. The language model is also expected to make fewer mistakes overall, unless explicitly told not to do so by the people using it.
InstructGPT sort of stops feeding on the internet
OpenAI’s GPT-3, the large language model built containing 175 billion parameters, is vastly trained on large amounts of text taken from the internet. As a result, the model encounters the best and worst of what people post on the internet in written text. This very learning turned out to be a problem for GPT-3, which debuted as being 10 times as large as Microsoft’s Turing NLG model and 17 times as large as its predecessor, GPT-2.
The language models like GPT-3 have been found to absorb the toxic language found on the internet, making them susceptible to racist and misogynistic behaviour and sometimes even have their own prejudice. With eliminating bias becoming an important topic in the AI world, OpenAI is addressing this every challenge with the introduction of InstructGPT.
OpenAI is doing this by making InstructGPT as the default model for users of its application programming interface (API), a service that gives users access to the company’s language models for a fee. OpenAI says GPT-3 will continue to be available but it doesn’t recommend using it.
“It’s the first time these alignment techniques are being applied to a real product,” says Jan Leike, who co-leads OpenAI’s alignment team,” says Jan Leike, co-lead of OpenAI’s alignment team in a conversation with MIT Technology Review.
This alignment technique tackles the problem of foul language or misinformation in dataset differently from previous methods such as filtering out offensive language. The filtering method led to diminished performance whereas the alignment method retains the performance that makes GPT-3 the preferred language model for a number of AI use cases.
InstructGPT: How OpenAI trained this updated model
The OpenAI team says they started with a fully trained model to avoid the problem of models performing less well. They added another round of training and used reinforcement learning to teach the model what it should say and when, based on the preferences of human users.
OpenAI also hired 40 people to train InstructGPT and rate its responses to a range of prewritten prompts. The responses judged to be in line with the intention of the prompt-writer were scored higher. The responses containing sexual or violent language, denigrated a specific group of people, expressed an option, etc. were marked down.
The feedback from this training was then used as the input in a reinforcement learning algorithm that trained InstructGPT, a language model capable of matching its responses with the intention of the prompt writer. OpenAI says that its API users favoured InstructGPT over GPT-3 more than 70 per cent of the time.
“This work is a first step towards using alignment techniques to fine-tune language models to follow a wide range of instructions. There are many open questions to explore to further align language model behaviour with what people actually want them to do,” OpenAI’s alignment team explains in a paper.
Ilya Sutskever, chief scientist at OpenAI, says, “It is exciting that the customers prefer these aligned models so much more. It means that there are lots of incentives to build them.
Interestingly, the OpenAI researchers also discovered that users preferred the responses of a 1.3 billion parameter InstructGPT model’s responses to those of a 175 billion parameter GPT-3 model. This could mean that AI researchers can now use alignment to improve the performance of their language models without adding more parameters and training sets.
InstructGPT, by no means, is the end to discrepancies or biases that we see with AI models. It still makes simple errors and since it is trained to do what users instruct it to, the model will assume a falsehood as truth if given such a prompt. The model could even produce more toxic language or results with prejudice if directed to do so by its user.
While InstructGPT is solving one of the issues that plagued GPT-3, there is a lot of work that remains to be done in making AI a more equitable and unbiased technology.