MIT AGI: Conversation with Yoshua Bengio [video]

115 points by AlanTuring 5 years ago

Some of his points I want to emphasize and ones I'd like to discuss further.

1. The method of training in current use won't lead to general systems.

It doesn't matter how large your image set is you'll never learn all of language and motor behavior from it. (In realistic terms)

Multi-modal learning where an agent is interacting with an environment will be critical for general systems due to a common expectation that general systems will be able to explore and problem solve. The requirement to explore and problem solve needs to be built into training.

I go a bit further and believe that you need a fairly high-fidelity world in which physically modeled agents are given progressively more difficult tasks if you want to achieve this generalization.

As he notes at the end of the talk some variant of reinforcement learning will be key for this.

2. Disentangling representations will be useful for both generality and preventing catastrophic forgetting.

Current algorithms update all parameters by default which means that all knowledge is vulnerable to loss. If you can disentangle representations (and I disagree that this can't be done in pixel space) then you can selectively prevent updates to some parameters and better prevent accidental forgetting.

Concretely, mimicking the sparsity of activation seen in the brain through implementing lateral inhibition (and further exploring top-down attentional sparsification mechanisms) should be useful in this area. (some good papers on this in 2018!)

3. The ability for agents to learn from teachers and the ability to teach agents interactively will improve generality and learning speed.

Agents today are largely trained from scratch. Some are fine-tuned from other agents, but relatively few are instructed in the way you would a dog or a child. Cultural transmission of information underlies a lot of what we think of as modern human intelligence so it's likely that systems that can benefit from this method of skill acquisition will have an advantage.

A point not mentioned was the crucial distinction between knowledge and abilities which are innate to biological creatures (developed over evolutionary timescales) and knowledge gained during a single lifetime.

Right now we try to teach agents everything and don't give them the benefit of starting with evolved bodies, reflexes, and adapted perceptive abilities. So even before we get to the cultural transmission of knowledge through teaching I think we need to (in parallel) be developing base agents which are "evolved" with solutions to lower level problems built in so that the networks don't have to re-learn how to catch yourself when falling every single time.

yazr 5 years ago

Could u give your opinion on Starcraft2 mastery as the next level?
So this includes opponents, long term goals, short term combinatoric optimizations, etc. But it is still a very simple, discrete, well-defined world with clear rewards.
The consensus is that brute-force DRL is insufficient.
My question is, if we add some currently known modules such as relational reasoning, memory, attention, hierarchical attention (maybe even human demonstrations), plus deepmind-scale computing, can we expect to see a super-human SC2 agent soon ?
Or are we really waiting for a more radical mechanism ?
- iandanforth 5 years ago
  
  Given the performance of the DOTA2 bots I don't see any reason why PPO + fancy network architecture won't solve this as well. I expect OpenAI to publish on this in 2019 if not sooner.
Aduket 5 years ago

isn't this called "transfer learning" and quite common in ML field now?
- iandanforth 5 years ago
  
  Good question. Transfer learning is related but not quite what I'm getting at. Your spinal reflexes are a neural network developed on an evolutionary timescale, they are somewhat flexible but also pretty specialized.
  If you come at this from the engineering perspective your goal would be to write controllers using a NN substrate and then make then make them updateable through experience.
  If you come at it from a pre-trained deep net perspective your goal is to find a smaller (probably shallower) network which captures low level motor patterns. You might do this by distilling a truncated larger net.
  Either way you're looking for a separate component that can be plugged into a larger agent system.

joe_the_user 5 years ago

So my impression of the way that Face recognition works is that the neural-net part of the system recognizes the faces in a photo, recognizes the features on the faces and provides a map of the features to the system. The system takes these parameters and stores and recovers a given face with them. The ability to recognize individual faces comes because conveniently these parameters don't change much (but clearly it hinges on this). Of course, there are a lot of small enhancements to this process and these evolve but this is still basic outline.

Which is to say that convolutional neural networks, the most common and successful neural networks, fundamentally only operate by recognizing a broad category (face, nose-on-face, mouth-on-face). This limitation seems as inherent in NNs operating with a black-box/training-set/testing-set cycle as it is in any architecture consideration. If all you can "tell" the NN is "this is X, this is not X", then it's plausible that's all it will give back. Moreover, "X/not-X" testing allows a LOT of training to happen in an unambiguous fashion, which seems necessary when what you're doing gradually pushing a giant bunch of semi-arbitrary detectors into place.

Machine learning theorists can do a lot with nothing but this category-recognition tool - like having a go program recognize "good moves" and "good positions". Note, that the more elaborate "neural net within neural net" approaches involve vast levels of computing power - you're brute-forcing brute-force. AlphaGoZero probably involved the largest level of computing power ever harnessed for a single-use app. AlphaGoZero involve megaflop levels close to those estimated for the human brain though such estimates are always debatable (perhaps why we didn't see many games by it).

The thing is there are plenty of ordinary computing algorithms do "more" than neural nets - perform logical operations or sort and search by vague similarity or classify on spectrum of multiple qualities (rather than the binary X or not-X of a neural network). It is just that these break down at an "industrial scale", whereas neural nets are shine at this scale.

However, despite these virtues, it seems likely we'll see neural nets approaching their particular limits reasonably soon, if they haven't (in at least a sense) done so already.

And it seems like any "real" extension/alternative will have to be able to use big data but use it in a more sophisticated fashion than neural nets.

Oh, this relates to the video most in that when Bengio talks about how to extend neural networks, he's mostly just mentioning extension in terms of how to add abstract qualities like "causation", where one can certainly see crude, simplistic things in computation that could be called "working with causation" but you don't have that yet when you go to the level of a large-scale neural network.

mgurlitz 5 years ago

NNs are not binary -- they're fundamentally analog, so for classification problems, they produce probabilities that a test case is in each possible class. A binary "X/not-X" test often comes from applying a threshold to the NN's output.
Quoting from the Universal approximation theory's Wikipedia, "neural networks can represent a wide variety of interesting functions." While they may be much better at pattern recognition, it's possible to produce almost anything with one, including Go moves, if you can devise a method to interpret the outputs.
- joe_the_user 5 years ago
  
  NNs are not binary -- they're fundamentally analog, so for classification problems, they produce probabilities that a test case is in each possible class.
  I think that's "true and false". Yes, NNs produce a "chance of being X" but no, NNs don't naturally produce a human-intuitive "degree of being X". If the NN is looking for a "red fire engine", it's not necessarily going to produce "degree of redness" as its output.
  -- As to neural nets representing many functions; neural nets, Taylor series, Fourier series and so-forth can represent/approximate "any function". True but effectively irrelevant. What matters is what function a given methodology can be trained, programmed or whatever into representing.
- iandanforth 5 years ago
  
  >> NNs are not binary -- they're fundamentally analog
  That's incorrect. https://arxiv.org/abs/1602.02830
  
  candiodari 5 years ago
  
  You could say they're "probabilistically binary", that would at least be intuitively accurate.
jasonmar 5 years ago

See this example for how neural network can be used to "sort and search by vague similarity and classify on a spectrum of multiple qualities". youtu.be/5PNnPagENxQ?t=1540
Descartes Labs uses the pre-trained ResNet 50 and removes the final layer which does classification. What's left is a layer that provides image features necessary to do classification. These features can be used to sort images by similarity and search for similar images.
- joe_the_user 5 years ago
  
  Indeed. Good and cool but I still claim this is not quite "it". I may not have specified "it" fully but feel I like, "it", meaning, can be intuitively obvious.
  They get a vector of approximate features and can use it match to other images.
  BUT there's still the "this means nothing" problem. The vectors, as far I can tell and by the logic of just doing autoencoding, don't have a significance except for the system. Can find image X and say it's like image Y.
  But it doesn't help at all at finding specified things. You can't say "find me a corn field" or "find my nuclear power plant". You can show it a picture of nuclear power plant and it can show you mountains with a similar layout.
glass_of_water 5 years ago

> Note, that the more elaborate "neural net within neural net" approaches involve vast levels of computing power...
This is one thing I worry about, that even if we were to get closer to AGI, the algorithms/system might not feasibly run with our current computing paradigm.
- joe_the_user 5 years ago
  
  Well,
  We have human beings who seem to function normally in situation where their brains have actually shrunk to a fraction of their skull size. And we animals that can do things robots can't do but will very small brains indeed. So something like an intelligent algorithm seems possible with "small" "hardware".
  Oppositely, it seems like hypothetically one could pretty easily specify an "intelligent algorithm" if one had a computer of limitless speed and memory.
  So approaching AGI would seem to involve producing things of sufficient efficiency and there's reasons to think that's possible.
  
  rubatuga 5 years ago
  
  Also, think of whales with massive brains, with significantly less intelligence than humans. Also, check out how smart crows are, with a vastly different brain structure than mammals, and with only a fraction of our brain volume.

JabavuAdams 5 years ago

Great talk! What's my best option for getting / making a transcript of this talk?

make3 5 years ago

does this work for you ? https://ccm.net/faq/40644-how-to-get-the-transcript-of-a-you...
- JabavuAdams 5 years ago
  
  That does! Did not know one could do that. This is super useful for my learning, thanks!