from Hacker News

ResearchGPT: Automated Data Analysis and Interpretation

by wyem on 4/25/23, 4:08 PM with 24 comments

by photochemsyn on 4/25/23, 4:36 PM
This seems like a fairly useful tool, but I'd be a bit cautious - the tradition of poring over a carefully collected and curated data set using tools you understand the strengths and weaknesses of shouldn't be lightly tossed aside. That process can help researchers spot unusual anomolies that lead to novel discoveries, while an automated tool might just discard all outliers.
Incidentally, the far more concerning issue is the use of approaches like this to generate data which opens the doors to a plague of hard-to-detect scientific fraud. In that past, many such high-visibility fraudulent efforts have been detected because the fraudsters duplicated data (or reversibly processed old data in some manner) that was spotted by others in the field, e.g.
https://en.wikipedia.org/wiki/Sch%C3%B6n_scandal
Often these fraudulent productions are inspired by the desire to be first to publish, a situation in which everyone thinks they know how a system works but they're all rushing to get credit (and hence Nobel Prizes and patents etc.) by generating the data from a 'successful experiment' before anyone else can.
by cl42 on 4/25/23, 4:13 PM
Thanks for posting this!
I'm the creator of ResearchGPT. A few things folks here might appreciate:
(1) You don't have to share data with the LLM provider to use this; it only shares metadata about your data set
(2) The demo uses Anthropic's Claude, rather than OpenAI's ChatGPT, but you can use our library to swap out any LLM
(3) It's open source!! Woo!
by cube2222 on 4/25/23, 4:43 PM
It's a really cool area of putting AI in a feedback loop (langchain-like) with its own tools, which I think is where the magic happens, and where we'll see much more happening in the future. This should really super-charge engineers doing stuff in areas where they're not super-comfortable in, but comfortable enough to verify the AI isn't doing anything stupid.
I made something vaguely similar for your local terminal[0] and other locally-available tools.
The idea is to give you a chat with an assistant that can use these local tools. Here it's Python for data analysis, in my case it's more "give it access to your terminal, so it can answer questions / do tasks on your local machine" which is something web-based options can't do right now.
I.e. ask it about your system details (processes, wifi) or to do things (configure something). Have it automatically run the relevant commands, analyze the output, and respond either in natural language or i.e. plot a chart.
AutoGPT[1] is another very interesting project in this area.
[0]: https://github.com/cube2222/cuttlefish
[1]: https://github.com/Significant-Gravitas/Auto-GPT
by Imnimo on 4/25/23, 6:49 PM
>https://github.com/wgryc/phasellm/blob/main/demos-and-produc...
Asking the LLM if it "understands" and only proceeding if it says yes feels very weird to me. Do we really expect the LLM to be able to introspect in that way and give a meaningful answer?
by davidktr on 4/25/23, 8:11 PM
I'm not sure about this approach. From what I have seen, most researchers have no idea how to get their data in a format which can be efficiently analysed.
Once you have that, it's trivial to do any kind of statistical analysis. In R, a regression is simply lm(y ~ x1 + x2 + ... + xn).
You can always look up how an API works, but thinking about data in terms of structures is what hinders effective analysis in most cases.
by mnky9800n on 4/25/23, 10:41 PM
I did something similar to this but got stuck that the code generated would sometimes work, sometimes not for identical prompts. I also found that as an expert in the topics it was easy to write a prompt that would generally build a reasonable data pipeline but I couldn't imagine if I just had some data, but not the expertise, I could do the same. How do you account for these issues?
by cuuupid on 4/25/23, 4:36 PM
Reminds me of this: https://www.palantir.com/platforms/aip/
I think there’s a lot of value here in empowering business users or more operational folks to use data without needing familiarity with a tool or language meant for data science
by arthurcolle on 4/25/23, 5:15 PM
Typo: "There are muliptle prompts "