Ajeesh Garg

MAE is less sensitive to outliers but I am still trying to mae as my loss function? Why do you think someone might choose MAE as the loss but still monitor MSE as a metric? What behavior during training might that help catch?

My data distribution is really sparse so as my data distribution is more sparse i choose mae over mse. Basically, it will

Loss and Loss function

Loss Function notes - mae

google.com

Attention and

SAM model

Transformers Explained Visually (Part 1): Overview of Functionality | Towards Data Science

towardsdatascience.com

Attention and

Transformers Explained Visually I

Attention Is All You Need

Introduces the Transformer, a novel neural network architecture based solely on attention mechanisms for sequence transduction, improving machine translation quality, training speed, and parallelization over recurrent and convolutional models.

proceedings.neurips.cc

Models and

Attention is all you need paper

arxiv.org

Transformers and

Vision TransFormers original paper