EleutherAI aims to break open Microsoft's monopoly over transformer-based language models with new GPT-J.
EleutherAI’s GPT-J vs OpenAI’s GPT-3
EleutherAI claims new NLP model approaches GPT-3-level performance
Can’t Access GPT-3? Here’s GPT-J — Its Open-Source Cousin
Fun and Dystopia With AI-Based Code Generation Using GPT-J-6B
GPT-3 Generated Summary:
- EleutherAI is a research group focused on AI alignment, scaling and open-source AI research
- The company released two GPT-Neo models with 1.3 billion and 2.7 billion parameters respectively
- Microsoft has the exclusive access to GPT-3’s source code as part of a larger agreement between the two companies
- EleutherAI is supported by Google and CoreWeave (cloud computing providers)
- EleutherAI has built 825 gigabytes (GB) of language modelling dataset called The Pile, curated from a set of datasets including arXiv, GitHub, Wikipedia, StackExchange, HackerNews
- Now, it has launched GPT-J, one of the largest models that EleutherAI has released till date
- GPT-J is a 6 billion parameters model trained on The Pile
- GPT-J is comparable in performance to the GPT-3 version of similar size — 6.7 billion parameters
- GPT-J was trained on GitHub (7 percent) and StackExchange (5 percent) data, it is better than GPT3 175B at writing code. However, in other tasks, it is significantly worse
- GPT-J is a fully end-to-end trainable Transformer LM on the same scale as 6.7B GPT-3, which achieves state of the art in zero-shot learning.
- The model was trained on 400 billion tokens from The Pile dataset with 800 GB text.
- Efficient attention (like linear, local or sliding window, etc.) was not used for simplicity, as it would not have significantly improved ‘throughput’ at this scale.
- The dimension of each ‘attention head’ was set to 256, which is more than that of GPT-3 of comparable size. “This noticeably improved the ‘throughput’ with minimal performance degradation,” said Komatsuzaki.
- The team made two minor architectural improvements for GPT-J–Rotary embedding for slightly better performance, and placed the attention layer and the feedforward layer in parallel for decreased communication.
Advantages of GPT-J:
- It is open source and hence can be used by anyone, including commercial organisations like Microsoft.
- It is based on the DeepSpeed framework and can be used for training and inference.
- It has been trained on GitHub and StackExchange datasets, which makes it better at writing code than GPT-3 175B model, but worse at other tasks like reading comprehension or translation.
- GPT-J has been built to be compatible with both TPUs as well as GPUs (via CoreWeave).