Apple collaborates with NVIDIA to research faster LLM performance
In a blog post today, Apple engineers have shared new details on a collaboration with NVIDIA to implement faster text generation performance with large language models.
Apple published and open sourced its Recurrent Drafter (ReDrafter) technique earlier this year. It represents a new method for generating text with LLMs that is significantly faster and “achieves state of the art performance.” It combines two techniques: beam search (to explore multiple possibilities) and dynamic tree attention (to efficiently handle choices).