Institut
für Grundlagen der Informationsverarbeitung (708)
Lecturer:
O.Univ.Prof. Dr. Wolfgang Maass
Office hours: by appointment (via email)
Email: maass@igi.tugraz.at
Homepage: https://igiweb.tugraz.at/people/maass/
Assoc. Prof. Dr. Robert Legenstein
Office hours: by appointment (via email)
Email: robert.legenstein@igi.tugraz.at
Homepage: www.igi.tugraz.at/legi/
"To illustrate the utility of learning to learn,
it is worthwhile to compare machine learning to human learning.
Humans encounter a continual stream of learning tasks. They do
not just learn concepts of motor skills, they also learn bias,
i.e., they learn how to generalize. As a result, humans are
often able to generalize correctly from extremely few examples 
often just a single example suffices to teach us a new thing. "
[Thrun, S., & Pratt, L. (Eds.). Learning to learn.
(2012)].
In this seminar, we will discuss novel work on
"learning to learn". This area of machine learning deals with
the following question: How can tone rain algorithms such that
they acquire the ability to learn?
Papers:
[This is a tentative list]
(1) Kingma, D., & Ba, J. (2014). Adam: A method for
stochastic optimization. arXiv preprint
arXiv:1412.6980.
Introduces
the Adam optimizer, one of the currently most frequently
used stochastic gradient descent methods.
(2)
Mnih, V., Badia, A. P., Mirza, M., Graves, A.,
Lillicrap, T. P., Harley, T., ... & Kavukcuoglu, K. (2016,
February). Asynchronous methods for deep reinforcement
learning. In International Conference on Machine Learning.
http://www.jmlr.org/proceedings/papers/v48/mniha16.pdf
Describes the
Asynchronous Advantage Actor Critic algorithm used in (4).
(3)
Mirowski, P., Pascanu, R., Viola,
F., Soyer, H., Ballard, A., Banino, A., ... & Kumaran, D.
(2016). Learning to navigate in complex environments. arXiv preprint
arXiv:1611.03673.
(4)
Hochreiter, S., Younger, A. S.,
& Conwell, P. R. (2001, August). Learning to learn using
gradient descent. In International Conference on
Artificial Neural Networks (pp. 8794). Springer Berlin
Heidelberg. https://www.researchgate.net/publication/225182080_Learning_To_Learn_Using_Gradient_Descent
(http://link.springer.com/chapter/10.1007/3540446680_13)
Introduces
the main idea used in (5) and (7).
(5)
Wang, J. X., KurthNelson, Z.,
Tirumala, D., Soyer, H., Leibo, J. Z., Munos, R., ... &
Botvinick, M. (2016). Learning to reinforcement learn. arXiv preprint
arXiv:1611.05763.
(6)
Chung, J., Gulcehre, C., Cho, K.,
& Bengio, Y. (2014). Empirical evaluation of gated
recurrent neural networks on sequence modeling. arXiv preprint
arXiv:1412.3555.
Describes
Gated Recurrent Units used in (5). Possible additional
reading: [1, 2].
(7) Duan, Y., Schulman,
J., Chen, X., Bartlett, P. L., Sutskever, I., &
Abbeel, P. (2016).
RL $^ 2$: Fast Reinforcement
Learning via Slow Reinforcement Learning. arXiv preprint
arXiv:1611.02779.
Possible
additional topic: TRPO
Trust Region Policy Optimization [3], since it is used here
(but quite technical).
(8)
Andrychowicz, M., Denil, M., Gomez, S., Hoffman,
M. W., Pfau, D., Schaul, T., & de Freitas, N. (2016).
Learning
to learn by gradient descent by gradient descent. In Advances
in Neural Information Processing Systems (pp.
39813989).
http://papers.nips.cc/paper/6461learningtolearnbygradientdescentbygradientdescent
Uses a
recurrent neural network to propose parameter update of
another neural network.
(9) Dosovitskiy, A., & Koltun, V. (2016). Learning
to act by predicting the future. arXiv preprint
arXiv:1611.01779.
Reinforcement
learning by prediction of measurements from a
highdimensional sensory stream. Also considers
generalization across environments and modification of
goals.
(10)
Jaderberg, M., Mnih, V., Czarnecki, W. M., Schaul,
T., Leibo, J. Z., Silver, D., & Kavukcuoglu, K. (2016).
Reinforcement learning with
unsupervised auxiliary tasks. arXiv preprint
arXiv:1611.05397.
Besides the
standard reinforcement learning objective, the deep RL agent
has to learn a number of general purposetasks that shall
help to produce better input representations. Needs A3C.
(11)
Sadtler, P. T., Quick, K. M.,
Golub, M. D., Chase, S. M., Ryu, S. I., TylerKabara, E. C.,
... & Batista, A. P. (2014). Neural constraints on
learning. Nature, 512(7515),
423426. PDF
http://www.nature.com/nature/journal/v512/n7515/abs/nature13665.html
This one and
(12) could be presented together. They present some findings
about learning in cortex. These papers are probably
interesting for students interested in BCI.
(12) Martinez,
C. A., & Wang, C. (2015). Structural constraints on
learning in the neural network. Journal of neurophysiology,
114(5), 25552557.
http://jn.physiology.org/content/114/5/2555.full.pdf+html
(13) Lake, B. M., Ullman, T. D.,
Tenenbaum, J. B., & Gershman, S. J. (2016). Building
Machines that learn and think like people. arXiv preprint
arXiv:1604.00289. https://cbmm.mit.edu/sites/default/files/publications/machines_that_think.pdf
Only parts of it should be discussed, e.g.
parts of Sections 4 and 5. It has in Section 4 also an
introduction to learning to learn.
(14) Tsividis, P. A., Pouncy, T., Xu,
J. L., Tenenbaum, J. B., & Gershman, S. J. (2017). Human
Learning in Atari. http://gershmanlab.webfactional.com/pubs/Tsividis17.pdf
Studies in a systematic way how humans
learn to play Atari games.
(15) JaraEttinger, J., Gweon, H., Schulz, L. E.,
& Tenenbaum, J. B. (2016). The naïve utility calculus:
computational principles underlying commonsense psychology.
Trends in Cognitive Sciences, 20(8), 589604.
http://jjara.scripts.mit.edu/cdl/docs/JaraEttingerGweonShulzTenenbaum_TiCS.pdf
Less technical than most other papers.
Further Refs:
[1] Cho, K., Van Merriënboer, B., Bahdanau, D.,
& Bengio, Y. (2014). On the properties of neural machine
translation: Encoderdecoder approaches. arXiv preprint
arXiv:1409.1259.
[2] Cho, K., Van Merriënboer, B., Gulcehre, C.,
Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y.
(2014). Learning phrase representations using RNN
encoderdecoder for statistical machine translation. arXiv preprint
arXiv:1406.1078.
[3]
Schulman, J., Levine, S., Abbeel, P., Jordan, M. I., &
Moritz, P. (2015, February). Trust Region Policy Optimization.
In ICML (pp. 18891897). http://www.jmlr.org/proceedings/papers/v37/schulman15.pdf
Talks should be not longer than 35 minutes, and
be clear, interesting and informative, rather than a reprint of
the material. Select what parts of the material you want to
present, and what not, and then present the selected material
well (including definitions not given in the material: look them
up on the web or if that is not successful, ask the seminar
organizers). Often diagrams or figures are useful for a talk. on
the other hand, giving in the talk numbers of references that
are listed at the end is a nono (a talk is an online process,
not meant to be read). For the same reasons you can also quickly
repeat earlier definitions or so if you suspect that the
audience may not remember it.
Talks will be assigned at the first seminar meeting on March
13th 16:1518:00. Students are requested to have a quick
glance at the papers prior to this meeting in order to
determine their preferences. Note that the number of
participants for this seminar will be limited. Preference will
be given to students who
Participation
in the seminar meetings is obligatory. We also request your
courtesy and attention for the seminar speaker: no
smartphones, laptops, etc during a talk. Furthermore your
active attention, questions, and discussion contributions are
expected.
3.4.2017, 15:30 
Talks 1,2: Absenger,
Loidl, Eder, Steger 
paper 14 
Tsividis, P. A., Pouncy, T., Xu, J. L., Tenenbaum, J.
B., & Gershman, S. J. (2017). Human Learning in
Atari. Sec. 3 of paper 13  Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2016). Building Machines that learn and think like people. Sec. 4.2 of paper 13  Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2016). Building Machines that learn and think like people. papers 11, 12  Sadtler, P. T., Quick, K. M., Golub, M. D., Chase, S. M., Ryu, S. I., TylerKabara, E. C., ... & Batista, A. P. (2014). Neural constraints on learning.; Martinez, C. A., & Wang, C. (2015). Structural constraints on learning in the neural network. 
SLIDES (von Absenger/Loidl) 
15.5.2017, 15:30 
Talk 1:
Brkic, Jambrecic 
paper 1 (check also RMSProp slides by Hinton)  Kingma, D., & Ba, J. (2014). Adam: A method for stochastic optimization.  SLIDES 
Talk 2:
Harb, Micorek 
paper 2  Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T. P., Harley, T., ... & Kavukcuoglu, K. (2016, February). Asynchronous methods for deep reinforcement learning.  SLIDES  
22.5.2017, 15:30 
Talk: Hasler, Hopfgartner 
paper 6 + LSTM  Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling.  SLIDES 
29.5.2017, 15:30 
Talk 1 + 2: Bohnstingl,
Scherr, Gabler 
paper 5 and 7  Wang, J. X.,
KurthNelson, Z., Tirumala, D., Soyer, H., Leibo, J.
Z., Munos, R., ... & Botvinick, M. (2016).
Learning to reinforcement learn. Duan, Y., Schulman, J., Chen, X., Bartlett, P. L., Sutskever, I., & Abbeel, P. (2016). RL $^ 2$: Fast Reinforcement Learning via Slow Reinforcement Learning. 
SLIDES 
12.6.2017, 15:30 
Talk
1: Salaj,
Stekovic 
paper 8  Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W., Pfau, D., Schaul, T., & de Freitas, N. (2016). Learning to learn by gradient descent by gradient descent.  SLIDES 
Talk 2:
Ainetter, Jantscher 
paper 3  Mirowski, P., Pascanu, R., Viola, F., Soyer, H., Ballard, A., Banino, A., ... & Kumaran, D. (2016). Learning to navigate in complex environments  SLIDES 

19.6.2017, 15:30  Talk 1: Müller, Reisinger  paper 9  Dosovitskiy, A., & Koltun, V. (2016). Learning to act by predicting the future.  SLIDES 
Talk 2: Lindner, Narnhofer  paper 10  Jaderberg, M., Mnih, V., Czarnecki, W. M., Schaul, T., Leibo, J. Z., Silver, D., & Kavukcuoglu, K. (2016). Reinforcement learning with unsupervised auxiliary tasks.  SLIDES 