by jsomers on 11/26/24, 12:08 PM with 21 comments
by m_ke on 11/26/24, 8:50 PM
In this case it's "VLA" as in Vision Language Action models, where a multimodal decoder predicts action tokens and "behavior cloning" is a fancy made up term for supervised learning, because all of the RL people can't get themselves to admit that supervised learning works way better than reinforcement learning in the real world.
Proper imitation learning where a robot learns from 3rd person view of humans doing stuff does not work yet, but some people in the field like to pretend that teleoperation and "behavior cloning" is a form of imitation learning.
by drcwpl on 11/27/24, 2:31 PM
by x11antiek on 11/26/24, 1:51 PM
by ratedgene on 11/26/24, 10:25 PM
Maybe it's like:
1. Intention, context 2. Attention scanning for components 3. Attention network discovery 4. Rescan for missing components 5. If no relevant context exists or found 6. Learned parameters are initially greedy 7. Storage of parameters gets reduced over time by other contributors
I guess this relies on there being the tough parts: induction, deduction, abductive reasoning.
Can we fake reasoning to test hypothesis that alter the weights of whatever model we use for reasoning?
by Animats on 11/26/24, 7:15 PM
Is there something which shows what the tokens they use look like?
by josefritzishere on 11/26/24, 10:02 PM
by codr7 on 11/26/24, 8:21 PM
by nobodywillobsrv on 11/27/24, 8:16 AM