Seminar Computational Intelligence B (708.112)

SS 2017

Institut für Grundlagen der Informationsverarbeitung (708)

Lecturer:
O.Univ.-Prof. Dr. Wolfgang Maass

Office hours: by appointment (via e-mail)

E-mail: maass@igi.tugraz.at
Homepage: https://igi-web.tugraz.at/people/maass/

Assoc. Prof. Dr. Robert Legenstein

Office hours: by appointment (via e-mail)

E-mail: robert.legenstein@igi.tugraz.at
Homepage: www.igi.tugraz.at/legi/

Location: IGI-seminar room, Inffeldgasse 16b/I, 8010 Graz
Date: starting on Monday, March 13th 2017, 16:15 - 18.00 p.m. (TUGonline)

Content of the seminar: Learning to Learn

"To illustrate the utility of learning to learn, it is worthwhile to compare machine learning to human learning. Humans encounter a continual stream of learning tasks. They do not just learn concepts of motor skills, they also learn bias, i.e., they learn how to generalize. As a result, humans are often able to generalize correctly from extremely few examples - often just a single example suffices to teach us a new thing. " [Thrun, S., & Pratt, L. (Eds.). Learning to learn. (2012)].

In this seminar, we will discuss novel work on "learning to learn". This area of machine learning deals with the following question: How can tone rain algorithms such that they acquire the ability to learn?

Papers:

[This is a tentative list]

(1) Kingma, D., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Introduces the Adam optimizer, one of the currently most frequently used stochastic gradient descent methods.

(2) Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T. P., Harley, T., ... & Kavukcuoglu, K. (2016, February). Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning.
http://www.jmlr.org/proceedings/papers/v48/mniha16.pdf
Describes the Asynchronous Advantage Actor Critic algorithm used in (4).

(3) Mirowski, P., Pascanu, R., Viola, F., Soyer, H., Ballard, A., Banino, A., ... & Kumaran, D. (2016). Learning to navigate in complex environments. arXiv preprint arXiv:1611.03673.

(4) Hochreiter, S., Younger, A. S., & Conwell, P. R. (2001, August). Learning to learn using gradient descent. In International Conference on Artificial Neural Networks (pp. 87-94). Springer Berlin Heidelberg. https://www.researchgate.net/publication/225182080_Learning_To_Learn_Using_Gradient_Descent (http://link.springer.com/chapter/10.1007/3-540-44668-0_13)
Introduces the main idea used in (5) and (7).

(5) Wang, J. X., Kurth-Nelson, Z., Tirumala, D., Soyer, H., Leibo, J. Z., Munos, R., ... & Botvinick, M. (2016). Learning to reinforcement learn. arXiv preprint arXiv:1611.05763.

(6) Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
Describes Gated Recurrent Units used in (5). Possible additional reading: [1, 2].

(7) Duan, Y., Schulman, J., Chen, X., Bartlett, P. L., Sutskever, I., & Abbeel, P. (2016). RL $^ 2$: Fast Reinforcement Learning via Slow Reinforcement Learning. arXiv preprint arXiv:1611.02779.
Possible additional topic: TRPO Trust Region Policy Optimization [3], since it is used here (but quite technical).

(8) Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W., Pfau, D., Schaul, T., & de Freitas, N. (2016). Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems (pp. 3981-3989).
http://papers.nips.cc/paper/6461-learning-to-learn-by-gradient-descent-by-gradient-descent
Uses a recurrent neural network to propose parameter update of another neural network.

(9) Dosovitskiy, A., & Koltun, V. (2016). Learning to act by predicting the future. arXiv preprint arXiv:1611.01779.
Reinforcement learning by prediction of measurements from a high-dimensional sensory stream. Also considers generalization across environments and modification of goals.

(10) Jaderberg, M., Mnih, V., Czarnecki, W. M., Schaul, T., Leibo, J. Z., Silver, D., & Kavukcuoglu, K. (2016). Reinforcement learning with unsupervised auxiliary tasks. arXiv preprint arXiv:1611.05397.
Besides the standard reinforcement learning objective, the deep RL agent has to learn a number of general purpose-tasks that shall help to produce better input representations. Needs A3C.

(11) Sadtler, P. T., Quick, K. M., Golub, M. D., Chase, S. M., Ryu, S. I., Tyler-Kabara, E. C., ... & Batista, A. P. (2014). Neural constraints on learning. Nature, 512(7515), 423-426. PDF
http://www.nature.com/nature/journal/v512/n7515/abs/nature13665.html
This one and (12) could be presented together. They present some findings about learning in cortex. These papers are probably interesting for students interested in BCI.

(12) Martinez, C. A., & Wang, C. (2015). Structural constraints on learning in the neural network. Journal of neurophysiology, 114(5), 2555-2557. http://jn.physiology.org/content/114/5/2555.full.pdf+html

(13) Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2016). Building Machines that learn and think like people. arXiv preprint arXiv:1604.00289. https://cbmm.mit.edu/sites/default/files/publications/machines_that_think.pdf
Only parts of it should be discussed, e.g. parts of Sections 4 and 5. It has in Section 4 also an introduction to learning to learn.

(14) Tsividis, P. A., Pouncy, T., Xu, J. L., Tenenbaum, J. B., & Gershman, S. J. (2017). Human Learning in Atari. http://gershmanlab.webfactional.com/pubs/Tsividis17.pdf
Studies in a systematic way how humans learn to play Atari games.

(15) Jara-Ettinger, J., Gweon, H., Schulz, L. E., & Tenenbaum, J. B. (2016). The naïve utility calculus: computational principles underlying commonsense psychology. Trends in Cognitive Sciences, 20(8), 589-604.
http://jjara.scripts.mit.edu/cdl/docs/Jara-EttingerGweonShulzTenenbaum_TiCS.pdf
Less technical than most other papers.

Further Refs:

[1] Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259.

[2] Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.

[3] Schulman, J., Levine, S., Abbeel, P., Jordan, M. I., & Moritz, P. (2015, February). Trust Region Policy Optimization. In ICML (pp. 1889-1897). http://www.jmlr.org/proceedings/papers/v37/schulman15.pdf

Talks should be not longer than 35 minutes, and be clear, interesting and informative, rather than a reprint of the material. Select what parts of the material you want to present, and what not, and then present the selected material well (including definitions not given in the material: look them up on the web or if that is not successful, ask the seminar organizers). Often diagrams or figures are useful for a talk. on the other hand, giving in the talk numbers of references that are listed at the end is a no-no (a talk is an online process, not meant to be read). For the same reasons you can also quickly repeat earlier definitions or so if you suspect that the audience may not remember it.

Talks will be assigned at the first seminar meeting on March 13th 16:15-18:00. Students are requested to have a quick glance at the papers prior to this meeting in order to determine their preferences. Note that the number of participants for this seminar will be limited. Preference will be given to students who

are / will write a Master's Thesis at the institute

are / will perform a Student's Project at the institute

have registered early.

General rules:

Participation in the seminar meetings is obligatory. We also request your courtesy and attention for the seminar speaker: no smartphones, laptops, etc during a talk. Furthermore your active attention, questions, and discussion contributions are expected.

After your talk (and possibly some corrections) send pdf of your talk to Charlotte Rumpf, who will post it on the seminar webpage.

TALKS:

3.4.2017, 15:30	Talks 1,2: Absenger, Loidl, Eder, Steger	paper 14 -- Tsividis, P. A., Pouncy, T., Xu, J. L., Tenenbaum, J. B., & Gershman, S. J. (2017). Human Learning in Atari. Sec. 3 of paper 13 -- Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2016). Building Machines that learn and think like people. Sec. 4.2 of paper 13 -- Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2016). Building Machines that learn and think like people. papers 11, 12 -- Sadtler, P. T., Quick, K. M., Golub, M. D., Chase, S. M., Ryu, S. I., Tyler-Kabara, E. C., ... & Batista, A. P. (2014). Neural constraints on learning.; Martinez, C. A., & Wang, C. (2015). Structural constraints on learning in the neural network.	SLIDES (von Absenger/Loidl)
15.5.2017, 15:30	Talk 1: Brkic, Jambrecic	paper 1 (check also RMSProp slides by Hinton) -- Kingma, D., & Ba, J. (2014). Adam: A method for stochastic optimization.	SLIDES
	Talk 2: Harb, Micorek	paper 2 -- Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T. P., Harley, T., ... & Kavukcuoglu, K. (2016, February). Asynchronous methods for deep reinforcement learning.	SLIDES
22.5.2017, 15:30	Talk: Hasler, Hopfgartner	paper 6 + LSTM -- Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling.	SLIDES
29.5.2017, 15:30	Talk 1 + 2: Bohnstingl, Scherr, Gabler	paper 5 and 7 -- Wang, J. X., Kurth-Nelson, Z., Tirumala, D., Soyer, H., Leibo, J. Z., Munos, R., ... & Botvinick, M. (2016). Learning to reinforcement learn. Duan, Y., Schulman, J., Chen, X., Bartlett, P. L., Sutskever, I., & Abbeel, P. (2016). RL $^ 2$: Fast Reinforcement Learning via Slow Reinforcement Learning.	SLIDES
12.6.2017, 15:30	Talk 1: Salaj, Stekovic	paper 8 -- Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W., Pfau, D., Schaul, T., & de Freitas, N. (2016). Learning to learn by gradient descent by gradient descent.	SLIDES
	Talk 2: Ainetter, Jantscher	paper 3 -- Mirowski, P., Pascanu, R., Viola, F., Soyer, H., Ballard, A., Banino, A., ... & Kumaran, D. (2016). Learning to navigate in complex environments	SLIDES
19.6.2017, 15:30	Talk 1: Müller, Reisinger	paper 9 -- Dosovitskiy, A., & Koltun, V. (2016). Learning to act by predicting the future.	SLIDES
	Talk 2: Lindner, Narnhofer	paper 10 -- Jaderberg, M., Mnih, V., Czarnecki, W. M., Schaul, T., Leibo, J. Z., Silver, D., & Kavukcuoglu, K. (2016). Reinforcement learning with unsupervised auxiliary tasks.	SLIDES