How To Set Parameters Of The Adadelta Algorithm In Tensorflow Correctly?

October 11, 2024 Post a Comment

I've been using Tensorflow for regression purposes. My neural net is very small with 10 input neurons, 12 hidden neurons in a single layer and 5 output neurons. activation functio

Solution 1:

Short answer: don't use Adadelta

Very few people use it today, you should instead stick to:

tf.train.MomentumOptimizer with 0.9 momentum is very standard and works well. The drawback is that you have to find yourself the best learning rate.
tf.train.RMSPropOptimizer: the results are less dependent on a good learning rate. This algorithm is very similar to Adadelta, but performs better in my opinion.

If you really want to use Adadelta, use the parameters from the paper: learning_rate=1., rho=0.95, epsilon=1e-6. A bigger epsilon will help at the start, but be prepared to wait a bit longer than with other optimizers to see convergence.

Note that in the paper, they don't even use a learning rate, which is the same as keeping it equal to 1.

Long answer

Adadelta has a very slow start. The full algorithm from the paper is:

The issue is that they accumulate the square of the updates.

At step 0, the running average of these updates is zero, so the first update will be very small.
As the first update is very small, the running average of the updates will be very small at the beginning, which is kind of a vicious circle at the beginning

I think Adadelta performs better with bigger networks than yours, and after some iterations it should equal the performance of RMSProp or Adam.

Here is my code to play a bit with the Adadelta optimizer:

import tensorflow as tf

v = tf.Variable(10.)
loss = v * v

optimizer = tf.train.AdadeltaOptimizer(1., 0.95, 1e-6)
train_op = optimizer.minimize(loss)

accum = optimizer.get_slot(v, "accum")  # accumulator of the square gradients
accum_update = optimizer.get_slot(v, "accum_update")  # accumulator of the square updates

sess = tf.Session()
sess.run(tf.initialize_all_variables())

for i in range(100):
    sess.run(train_op)
    print "%.3f \t %.3f \t %.6f" % tuple(sess.run([v, accum, accum_update]))

The first 10 lines:

  v       accum     accum_update
9.994    20.000      0.000001
9.988    38.975      0.000002
9.983    56.979      0.000003
9.978    74.061      0.000004
9.973    90.270      0.000005
9.968    105.648     0.000006
9.963    120.237     0.000006
9.958    134.077     0.000007
9.953    147.205     0.000008
9.948    159.658     0.000009

tmahurin

How To Set Parameters Of The Adadelta Algorithm In Tensorflow Correctly?

Solution 1:

Short answer: don't use Adadelta

Long answer

Post a Comment for "How To Set Parameters Of The Adadelta Algorithm In Tensorflow Correctly?"

Widget HTML #3