Snap interview question

What is Multi-head Attention? How does it work?