I have a question for question 22 in the exam 2020. Why it is true? In my understanding, SignSGD has far less computational complexity. Thank you very much.

For deep learning applications, SignSGD first computes the full gradient (same computational complexity) and then computes the sign of each gradient coordinate. This can reduce the communication complexity in distributed learning (because less information needs to be shared between workers), but the computation required is about the same. The cost is dominated by gradient computation.

## Question for Q22 in the exam 2020

Hi,

I have a question for question 22 in the exam 2020. Why it is true? In my understanding, SignSGD has far less computational complexity. Thank you very much.

For deep learning applications, SignSGD first computes the full gradient (same computational complexity) and then computes the sign of each gradient coordinate. This can reduce the communication complexity in distributed learning (because less information needs to be shared between workers), but the computation required is about the same. The cost is dominated by gradient computation.

## Add comment