Meet Medusa: An Efficient Machine Learning Framework for Accelerating Large Language Models (LLMs) Inference with Multiple Decoding Heads

Meet Medusa: An Efficient Machine Learning Framework for Accelerating Large Language Models (LLMs) Inference with Multiple Decoding Heads