by thesephist on 9/14/22, 8:24 PM with 77 comments
by visarga on 9/14/22, 10:56 PM
https://mobile.twitter.com/sergeykarayev/status/156937788144...
by frenchie4111 on 9/15/22, 12:21 AM
Also this is very very cool, I love copilot, I hope I get to use this thing very soon.
by tasdfqwer0897 on 9/14/22, 8:34 PM
by leetrout on 9/15/22, 3:01 AM
by mrits on 9/14/22, 9:37 PM
by codekansas on 9/14/22, 9:08 PM
by joaquincabezas on 9/15/22, 8:26 AM
“I want to travel from Seville to Berlin next October, avoiding weekends, for a two or three nights stay in a hotel by the river. Direct flights preferred.”
by blind666 on 9/14/22, 11:01 PM
by colemannugent on 9/14/22, 10:00 PM
>Anyone who can articulate their ideas in language can implement them
I'd be shocked if even 10% of the users who can't navigate a GUI could accurately describe what they want the software to do. To the user who doesn't know they can use Ctrl-Z to undo, the first half dozen times the AI mangles their inherited spreadsheet might be enough to put them off the idea.
by holoduke on 9/14/22, 9:14 PM
by anigbrowl on 9/14/22, 10:15 PM
- OK here's my email
- Please select all pictures of taxis to prove you are not a robot
ಥ_ಥ
Seriously though, the potential is good. I see several things they're doing right that have the potential to distinguish them from competing offerings.by bluecoconut on 9/14/22, 10:25 PM
A few questions:
1. I'm curious if you're representing the task-operations using RL techniques (as many personal assistant systems seem to be) or if this is entirely a seq2seq transformer style model for predicting actions?
2. Assumption: Due to scaling of transformers, I assume that this is not directly working on the image data of a screen, and instead is working off of DOM trees; (2a) is this the case? and (2b) if so, are you using purely linear tokenization of the tree or are you using something closer to Evoformer (AlphaFold style) to combine graphs-neural nets and transformers?
3. Have you noticed that learning actions and representations of one application transfers well to new applications? or is the quality of the model heavily dependent on app domain?
I noticed multiple references to data applications (Excel, tableau, etc.). My challenge is that large language models and AI systems in general are about to hit a wall in the data domain because they fundamentally don't understand data [1] [2], which will ultimately limit the quality of these capabilities.
I am personally tackling this problem directly. I'm tying to prove more coherent data-aware operations in these systems by building a "foundation model" for tabular data that connects to LLMs (think RETRO style lookups of embeddings (representing columns of data)). I have been prototyping conversational AI systems (mostly Q/A oriented), and have recently been moving towards task oriented operations (right now, transparently, just SQL executors).
There seem to be good representations of DOM tree/visual-object models that you all are working with to take reasonable action, however I assume these are limited in scale (N^2 and all), and so I am wondering if you have any opinions on how to extend these systems for data (especially as the "windowed context grows" (eg. an excel with 100k+ rows))?
[1] https://arxiv.org/abs/2106.03253 "Tabular Data: Deep Learning is Not All You Need" [2] https://arxiv.org/abs/2110.01889 "In summary, we think that a fundamental reorientation of the domain may be necessary. For now, the question of whether the use of current deep learning techniques is beneficial for tabular data can generally be answered in the negative"
by atemerev on 9/15/22, 8:03 AM
by skybrian on 9/14/22, 10:38 PM
by rajnathani on 9/17/22, 6:08 AM
by d--b on 9/15/22, 6:56 AM
by lee101 on 9/15/22, 12:30 AM
I feel like some of this could one day be built using a shared model that understands HTML and JavaScript code etc with a few example prompts. Or maybe something that understands intent+a browser automation language like Selenium, if not then some custom input output language+training as adept alludes to.
If interested in building something like this also checkout https://text-generator.io which already pulls down links and images to analyse to generate better text so has a lot of the required parts
by FeepingCreature on 9/15/22, 1:13 AM
by i_am_toaster on 9/14/22, 10:14 PM
by midislack on 9/14/22, 11:49 PM