What is Transfer Learning? What are the approaches to achieve Transfer Learning? Briefly state the high-level algorithm of transfer learning.  What are the benefits of Transfer Learning over training a model from scratch?  What is embedding and why is it needed for Text processing ?  Draw the architecture of a Transformer and briefly explain the functionality of the blocks ? [note: directly copy pasting from internet is not allowed].