by alt-glitch on 3/28/24, 2:44 PM with 59 comments
by xfalcox on 3/28/24, 4:17 PM
> Allowing a Q&A interface using these embeddings over the post contents could speed up research over the community posts (if you know the right questions to ask :P). Let's view some posts similar to this one complaining about function calling
That's indeed a great thing to surface, and that's exactly how the the OpenAI forum selects the "Related Topics" to show at the end of every topic. We use embeddings for this feature, and the entire thing is open-source: https://github.com/discourse/discourse-ai/blob/main/lib/embe...
We also embeddings for suggesting tags, categories, HyDE search and more. It's by far my favorite tech of this new AI/ML gen so far in terms of applicability.
> Using Twitter-roBERTa-base for sentiment analysis, we generated a post_sentiment label (negative, positive, neutral) and post_sentiment_score confidence score for each post.
We do the same, with even the same model, and conveniently show that information on the admin interface of the forum. Again all open source: https://github.com/discourse/discourse-ai/tree/main/lib/sent...
Disclaimer: I'm the tech lead on the AI parts of Discourse, the open source software that powers OpenAI's community forum.
by wavyknife on 3/28/24, 4:12 PM
Discourse has an AI plugin that admins can run on their community to generate their own sentiment analysis (among other things), though it's not quite as thorough as this write up! https://meta.discourse.org/t/discourse-ai-plugin/259214
We're always interested to see how public data can be used like this. It's something that can be a lot more difficult on closed platforms.
by SunlitCat on 3/28/24, 3:51 PM
Maybe I'm not looking thoroughly enough, so I may be wrong, tho!
by miduil on 3/28/24, 3:44 PM
by klooney on 3/29/24, 1:20 AM
by fzysingularity on 3/28/24, 5:21 PM
by alright2565 on 3/29/24, 3:00 PM
> Every Discourse Discussion returns data in JSON if you append .json to the URL.
then this:
> Raw data was gathered into a single JSONL file by automating a browser using Playwright.
Kinda seems to me like having a whole browser instance for this isn't necessary? I would have been surprised if this .json pattern didn't continue for all pages, and it turns out that it does in fact also work for the topic list: https://community.openai.com/latest.json
The other place I've seen this sort of API pattern is reddit. For example, https://www.reddit.com/r/all.json or (randomly chosen) https://www.reddit.com/r/mildlyinfuriating/comments/1bqn3c0/...
by velid0 on 3/28/24, 3:53 PM
by garyiskidding on 3/29/24, 8:33 AM
by xandrius on 3/28/24, 3:30 PM
by dorkwood on 3/28/24, 4:17 PM
OpenAI has taught me that no one gives a shit. Scrape the entire internet if you want, and use the data for whatever you feel like.
by enonimal on 3/28/24, 3:56 PM
> # 1 Result: Python Packaging
Checks out
by throwaway98797 on 3/28/24, 3:50 PM
/s