How To Set Parameters Of The Adadelta Algorithm In Tensorflow Correctly?
I've been using Tensorflow for regression purposes. My neural net is very small with 10 input neurons, 12 hidden neurons in a single layer and 5 output neurons. activation functio
Solution 1:
Short answer: don't use Adadelta
Very few people use it today, you should instead stick to:
tf.train.MomentumOptimizer
with0.9
momentum is very standard and works well. The drawback is that you have to find yourself the best learning rate.tf.train.RMSPropOptimizer
: the results are less dependent on a good learning rate. This algorithm is very similar to Adadelta, but performs better in my opinion.
If you really want to use Adadelta, use the parameters from the paper: learning_rate=1., rho=0.95, epsilon=1e-6
. A bigger epsilon
will help at the start, but be prepared to wait a bit longer than with other optimizers to see convergence.
Note that in the paper, they don't even use a learning rate, which is the same as keeping it equal to 1
.
Long answer
Adadelta has a very slow start. The full algorithm from the paper is:
The issue is that they accumulate the square of the updates.
- At step 0, the running average of these updates is zero, so the first update will be very small.
- As the first update is very small, the running average of the updates will be very small at the beginning, which is kind of a vicious circle at the beginning
I think Adadelta performs better with bigger networks than yours, and after some iterations it should equal the performance of RMSProp or Adam.
Here is my code to play a bit with the Adadelta optimizer:
import tensorflow as tf
v = tf.Variable(10.)
loss = v * v
optimizer = tf.train.AdadeltaOptimizer(1., 0.95, 1e-6)
train_op = optimizer.minimize(loss)
accum = optimizer.get_slot(v, "accum") # accumulator of the square gradients
accum_update = optimizer.get_slot(v, "accum_update") # accumulator of the square updates
sess = tf.Session()
sess.run(tf.initialize_all_variables())
for i in range(100):
sess.run(train_op)
print "%.3f \t %.3f \t %.6f" % tuple(sess.run([v, accum, accum_update]))
The first 10 lines:
v accum accum_update
9.994 20.000 0.000001
9.988 38.975 0.000002
9.983 56.979 0.000003
9.978 74.061 0.000004
9.973 90.270 0.000005
9.968 105.648 0.000006
9.963 120.237 0.000006
9.958 134.077 0.000007
9.953 147.205 0.000008
9.948 159.658 0.000009
Post a Comment for "How To Set Parameters Of The Adadelta Algorithm In Tensorflow Correctly?"