from Hacker News

Twitter's Recommendation Algorithm

by jonknee on 3/31/23, 6:42 PM with 1185 comments

by jonathanmayer on 3/31/23, 7:07 PM
Context: I teach at Princeton and study social media and recommendation systems.
From a very quick skim of the repositories, this appears to be quite limited transparency. The documentation gives a decent high-level overview of how Tweet recommendation works—no surprises—and the code tracks that roadmap. Those are meaningful positive steps. But the underlying policies and models are almost entirely missing (there are a couple valuable components in [1]). Without those, we can't evaluate the behavior and possible effects of "the algorithm."
[1] https://github.com/twitter/the-algorithm-ml
by corbulo on 3/31/23, 7:48 PM
It's disappointing the comments are so obsessed with the political angle to this that there's a total lack of appreciation (or discussion) of opening up the most influential social media platform in the world.
by phailhaus on 3/31/23, 7:39 PM
Great! But nothing is going to change until people realize that the problem is the feedback loop. It's not the recommendation engine itself, it's the fact that there's no way "out" of the feed that the engine produces. It recommends you stuff, you have little choice but to engage with it, and then it trains on that information.
This is the problem with most of social media today. It is a very well known problem in ML [1], but nobody is willing to do anything about it because it's a fundamental UX change. Facebook, Twitter, YouTube, TikTok, they have defined themselves by their recommendation engines.
[1] https://towardsdatascience.com/dangerous-feedback-loops-in-m...
by PenguinRevolver on 3/31/23, 7:02 PM
Great pull request here which improves the algorithm: https://github.com/twitter/the-algorithm/pull/17
by jillesvangurp on 4/1/23, 1:54 AM
The irony is that I prefer Mastodon's sort by time and don't try to be clever approach to this expensive and futile attempt to feed me an endless stream of click bait. I objectively spend more time on Mastodon than on Twitter at this point. It's more engaging for me. It's how Twitter used to work when it was still nice to use.
If Twitter wants to put a stop to the user exodus and save lots of money in the process, here's what they could do:
1) Add an off switch to the for you feed. I'll click it right away and never turn it on again. Stop wasting minutes of CPU time on my behalf. I never asked for it. It doesn't do anything for me that I need or want.
2) Sort by time, filter by hashtag. Twitter used to be about real time information. I don't care about things that happened days or weeks ago. I don't need to see all of it. This is the core feature that made Twitter popular. Mastodon has it and it is absorbing users from Twitter by the millions. It still works. Restore this feature and make it the default.
3) Join the fediverse. That's where a lot of the former hard core users went. They still exist. They still post messages. They still engage with each other. They just don't use Twitter anymore. Allow people to follow mastodon users. Allow mastodon users to follow Twitter users. Not that hard to implement and probably would do wonders for user engagement.

by HellsMaddy on 3/31/23, 8:14 PM

Interesting:

    // we only keep unfollows in the past 90 days due to the huge size of this dataset,
    // and to prevent permanent "shadow-banning" in the event of accidental unfollows.
    // we treat unfollows as less critical than above 4 negative signals, since it deals more with
    // interest than health typically, which might change over time.
    val unfollows: SCollection[InteractionGraphRawInput] =
      GraphUtil
        .getSocialGraphFeatures(
          readSnapshot(SocialgraphUnfollowsScalaDataset, sc),
          FeatureName.NumUnfollows,
          endTs)
        .filter(_.age < 90)

https://github.com/twitter/the-algorithm/blob/main/src/scala...

by tric on 3/31/23, 6:56 PM

From https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...

    (
      "author_is_elon",
      candidate =>
        candidate
          .getOrElse(AuthorIdFeature, None).contains(candidate.getOrElse(DDGStatsElonFeature, 0L))),
    (
      "author_is_power_user",
      candidate =>
        candidate
          .getOrElse(AuthorIdFeature, None)
          .exists(candidate.getOrElse(DDGStatsVitsFeature, Set.empty[Long]).contains)),
    (
      "author_is_democrat",
      candidate =>
        candidate
          .getOrElse(AuthorIdFeature, None)
          .exists(candidate.getOrElse(DDGStatsDemocratsFeature, Set.empty[Long]).contains)),
    (
      "author_is_republican",
      candidate =>
        candidate
          .getOrElse(AuthorIdFeature, None)
          .exists(candidate.getOrElse(DDGStatsRepublicansFeature, Set.empty[Long]).contains)),
    )

by koolba on 3/31/23, 7:20 PM
It's reassuring to know that billion dollar tech companies write CI exactly like I do:
https://github.com/twitter/the-algorithm/blob/main/ci/ci.sh
Permalink: https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...
by sillysaurusx on 3/31/23, 7:24 PM
Say what you will about Elon, but this wouldn't have happened without him. Thanks!
And thank you to everyone at Twitter who helped organize this release. Open sourcing something like this is no small effort.
by tech234a on 3/31/23, 7:19 PM
I wonder what the "author_is_elon", "author_is_power_user", "author_is_democrat", and "author_is_republican" labels are for [1].
[1]: https://github.com/twitter/the-algorithm/blob/main/home-mixe...
by summarity on 3/31/23, 6:53 PM
Main repos:
- https://github.com/twitter/the-algorithm
- https://github.com/twitter/the-algorithm-ml
Blogs:
- Eng: https://blog.twitter.com/engineering/en_us/topics/open-sourc...
- Biz: https://blog.twitter.com/en_us/topics/company/2023/a-new-era...
by rogerallen on 3/31/23, 7:05 PM
"Today, the For You timeline consists of 50% In-Network Tweets and 50% Out-of-Network Tweets on average, though this may vary from user to user."
I have spent significant effort creating a network and there you go choosing to ignore my efforts by putting in 50% of crap-I-don't-want-to-see.
That is why I despise your algorithm.
by anderspitman on 3/31/23, 7:00 PM
I'm not opposed to social media feeds having complex recommendation algorithms. I just wish they allowed you to opt in to a reverse chronological feed of only people you follow, like RSS.
by danso on 3/31/23, 7:16 PM
> Twitter has several Candidate Sources that we use to retrieve recent and relevant Tweets for a user. For each request, we attempt to extract the best 1500 Tweets from a pool of hundreds of millions through these sources. We find candidates from people you follow (In-Network) and from people you don’t follow (Out-of-Network).
> Today, the For You timeline consists of 50% In-Network Tweets and 50% Out-of-Network Tweets on average, though this may vary from user to user.
It would’ve been interesting to see what changes were made since Musk’s takeover. As someone who followed 5,000+ users, I know I never saw a tweet that wasn’t either from nor retweeted by someone I followed — e.g. I never saw those “[user you follow] liked [someone you don’t follow] tweet”
50%/50% in FYP seems to reflect my experience today — which is much worse, to the point that I’ll regularly switch to viewing by List b/c I miss seeing people who I want to read.
I wonder how much testing and analysis went into deciding on the 50/50 ratio — e.g. how does it impact user engagement and behavior. Because it sounds like an easy round value that you’d land on when thinking “users should be pushed out of their bubbles”
by d_sc on 4/1/23, 10:01 PM
I think they have a bug here here: https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...
Code: ( "has_gte_10k_favs", _.getOrElse(EarlybirdFeature, None).exists(_.favCountV2.exists(_ >= 1000))),
Should be: ( "has_gte_10k_favs", _.getOrElse(EarlybirdFeature, None).exists(_.favCountV2.exists(_ >= 10000))),

by roddylindsay on 3/31/23, 7:55 PM

  For ranking the candidates these predictions are combined into a score by 
  weighting them:
  
  "recap.engagement.is_favorited": 0.5 
  "recap.engagement.is_good_clicked_convo_desc_favorited_or_replied": 11* (the 
  maximum prediction from these two "good click" features is used and weighted by 
  11, the other prediction is ignored). 
  "recap.engagement.is_good_clicked_convo_desc_v2": 11* 
  "recap.engagement.is_negative_feedback_v2": -74 
  "recap.engagement.is_profile_clicked_and_profile_engaged": 12 
  "recap.engagement.is_replied": 27 
  "recap.engagement.is_replied_reply_engaged_by_author": 75 
  "recap.engagement.is_report_tweet_clicked": -369 
  "recap.engagement.is_retweeted": 1 "recap.engagement.is_video_playback_50": 0.005

Who set those weights, and why were they chosen?

by ryzvonusef on 3/31/23, 8:11 PM

https://twitter.com/jarokrolewski/status/1641892148084629504

    > the main neural network part of @Twitter recsys algo is based on 2021 work of #SinaWeibo - Chinese clone of Twitter

interesting claim

by varjag on 3/31/23, 6:58 PM
Rank each Tweet using a machine learning model.
This does a lot of heavy lifting here.
by simonsarris on 3/31/23, 7:13 PM
This is pretty limited. I picked a term used in the diagram to see what I could find out about it. But there seems to be next to nothing in the released code about the mentioned "author diversity". No real code or description.
by crop_rotation on 3/31/23, 6:55 PM
Wouldn't any such system depend on 10 other internal systems, 20 databases directly or indirectly, each affecting the behaviour of the recommendation engine. That makes me doubtful studying such a recommendation engine is any better than a purely academic exercise.
by jonknee on 3/31/23, 6:52 PM
projects/home/recap/FEATURES.md has some interesting stuff:
https://github.com/twitter/the-algorithm-ml/blob/main/projec...
In realgraph you can see some of the things they keep track of, which include what you have in your address book, total time spent "dwelling" and a few other interesting nuggets.
by paxys on 3/31/23, 7:25 PM
Since this is what most people are going to want to see:
> We also took additional steps to ensure that user safety and privacy would be protected, including our decision not to release training data or model weights associated with the Twitter algorithm at this point.
by paxys on 3/31/23, 7:22 PM
While open sourcing code is always great, and kudos on them for doing so, let's be real most people didn't care about the internal plumbing of how their recommendation system runs. It's going to be a mess of decades old code, microservices and ML pipelines just like one would expect. If you want to dig deeper to check for biases (the reason they claimed to be open sourcing it in the first place), you will however run into:
> We also took additional steps to ensure that user safety and privacy would be protected, including our decision not to release training data or model weights associated with the Twitter algorithm at this point.
which is a shame.
by etc_passwd on 3/31/23, 7:54 PM
Democrats / Republicans looks like it was added outside of SDLC [1]. This order without those features is sorted, likely by a linter, suggesting Elon and Vits are properly implemented, and Democrats/Republicans was just inserted alongside the Elon feature, perhaps just for this extract. Sorting it now results in a different order than the commit.
[1]: https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...
by tric on 3/31/23, 6:45 PM
GitHub repo: https://github.com/twitter/the-algorithm/
by jongjong on 4/1/23, 3:32 AM
WTF is AuthorIsEligibleForConnectBoostFeature? I guess this may explain why some people seem to accumulate a lot of followers very quickly while all those trying to grow organically seem to struggle. You can imagine if a lot of people benefit from this Connect Boost feature, it would make it impossible for others to be noticed through the noise created by all of these boosted individuals. That's essentially what Twitter feels like ATM. Recently, I manually unfollowed anyone who I suspect may have received a special boost from the algorithms.
by Me1000 on 3/31/23, 7:44 PM
Squashing the commit history before releasing it was an interesting (and completely predictable) decision.
by sudo_navendu on 4/1/23, 3:59 AM
Weights on different metrics. From https://github.com/twitter/the-algorithm/blob/ec83d01dcaebf3...
private def getLinearRankingParams: ThriftRankingParams = { ThriftRankingParams( `type` = Some(ThriftScoringFunctionType.Linear), minScore = -1.0e100, retweetCountParams = Some(ThriftLinearFeatureRankingParams(weight = 20.0)), replyCountParams = Some(ThriftLinearFeatureRankingParams(weight = 1.0)), reputationParams = Some(ThriftLinearFeatureRankingParams(weight = 0.2)), luceneScoreParams = Some(ThriftLinearFeatureRankingParams(weight = 2.0)), textScoreParams = Some(ThriftLinearFeatureRankingParams(weight = 0.18)), urlParams = Some(ThriftLinearFeatureRankingParams(weight = 2.0)), isReplyParams = Some(ThriftLinearFeatureRankingParams(weight = 1.0)), favCountParams = Some(ThriftLinearFeatureRankingParams(weight = 30.0)), langEnglishUIBoost = 0.5, langEnglishTweetBoost = 0.2, langDefaultBoost = 0.02, unknownLanguageBoost = 0.05, offensiveBoost = 0.1, inTrustedCircleBoost = 3.0, multipleHashtagsOrTrendsBoost = 0.6, inDirectFollowBoost = 4.0, tweetHasTrendBoost = 1.1, selfTweetBoost = 2.0, tweetHasImageUrlBoost = 2.0, tweetHasVideoUrlBoost = 2.0, useUserLanguageInfo = true, ageDecayParams = Some(ThriftAgeDecayRankingParams(slope = 0.005, base = 1.0)) ) }
by evantahler on 4/1/23, 1:39 AM
So uh... they use BigQuery and here's the dataset https://github.com/twitter/the-algorithm/blob/main/ann/src/m...
by rco8786 on 3/31/23, 7:25 PM
So as expected, there is exactly nothing that favors posters from one side of the political spectrum. I don't expect that this article will do anything to calm down those who are convinced otherwise though.
Well written article, from an engineer's perspective.
by Egoist on 3/31/23, 7:01 PM
Aaaand the issues turned into a shitpost
by quotemstr on 3/31/23, 7:12 PM
Typically, we expect to be able to run "open source" software ourselves. If you open-source your C compiler, I can compile a C program with it. In a few recent high-profile cases though, companies have "open sourced" ML systems without releasing the model weights. This practice is just like your releasing the builds scripts for your C compiler, but not the compiler itself. While more transparency from social media will be enlightening, calling a release like this (or LLaMA) "open source" feels like equivocation. I'd love to see more full releases, weights included.
by robopsychology on 3/31/23, 7:27 PM
Why are there two spaces instead of four in this Python code, it hurts my soul
by Laaas on 3/31/23, 7:27 PM
Praise where praise is due. Wasn't completely sure whether they would in fact release it or keep posturing.
by sroussey on 3/31/23, 7:03 PM
Does it show the part where is recommends Elon more than anyone else?
by dang on 3/31/23, 8:44 PM
Url changed from https://github.com/twitter/the-algorithm-ml, which points to this.
by abalaji on 3/31/23, 6:57 PM
huh, legit open source too with 'Affero-GPL'
by junto on 4/1/23, 8:39 AM
Did anyone else notice this below? I can’t even begin to imagine how many CPU’s that would require and what the cost must be… just for a recommendation engine.
> The pipeline above runs approximately 5 billion times per day and completes in under 1.5 seconds on average. A single pipeline execution requires 220 seconds of CPU time, nearly 150x the latency you perceive on the app.
by sho_hn on 3/31/23, 7:35 PM
My main questions: Will these repositories be used in production by Twitter? Is this now the mainline, not a semi-regularly-synced mirror?
by Weidenwalker on 4/2/23, 5:03 PM
I visualized this codebase here: https://codeatlas.dev/github/codeatlasHQ/the-algorithm/main
Maybe this is helpful to anyone for navigating what's in there!
by froggychairs on 3/31/23, 8:07 PM
Why is nobody pointing out that this is likely an April Fools joke? We just deployed our April Fools joke into production today too.
by thumbsup-_- on 3/31/23, 8:19 PM
The barebones ReadMe makes me feel this repository was open-sourced against the wish of engineers and with a top down directive
by matesz on 3/31/23, 8:48 PM
It is really nice to see how bazel is used in the wild. It looks so clean. Why we are not using it for everything?
by ryanisnan on 3/31/23, 9:36 PM
I want to go back to a world where there isn't an algorithm feeding me what someone "thinks" I want to read.
I want to see a chronological list of things sources I follow have posted.
Yes, I understand you can do this on Twitter still, but I would guess most people are more influenced by "the algorithm".
by stusmall on 3/31/23, 7:59 PM
I thought it was an april fools joke when I saw this: https://github.com/twitter/the-algorithm/blob/main/ci/ci.sh
Like a dig at the code quality.
by bagels on 3/31/23, 10:43 PM
"Written by the Twitter Team"
I found it interesting that there is no attribution. Most other companies list the authors on engineering blogs (eg. Facebook, Uber, etc.)
This topic seems to draw the attention of unhinged people, so I suppose I wouldn't want my name on it either.
by NicoJuicy on 4/1/23, 5:31 AM
Are they measuring getting more republican posts? Because I'm getting a ton of those, which i constantly need to mute and ban ( mostly dumb remarks).
And i don't even live in the US.
It would explain why they are tracking it, to increase visibility.
by AlbertCory on 3/31/23, 7:04 PM
I haven't read the "algorithm" and this observation might be seriously out of date, but:
for Google Ads, you couldn't easily know what ads would be shown for a given query, without a whole lot of data that's not contained in any code: the experiment settings in the server, for one thing. And the user who's doing the query, for another.
An "experiment" could apply to 100% of the traffic, so it's not really an experiment anymore. And even if you think X has been put into production, there is still a "holdback" experiment, where some part of the traffic does not get X applied to it.
by rblion on 4/2/23, 7:55 AM
First thing I would like to see gone is business bros sharing 'guides' after you follow them, threatening to start charging real soon. Go fuck yourself, get a real job.
by oulu2006 on 4/2/23, 12:55 AM
<tounge-in-cheek> didn't twitter already opensource their code?
https://www.databreachtoday.com/twitter-says-source-code-lea...
by cmckn on 3/31/23, 8:07 PM
Including the search engine itself in “the algorithm” repo is an interesting choice. Obviously it’s a major player in what gets returned to clients, but the details of that infrastructure aren’t really relevant and is a notable portion of their secret sauce.
https://github.com/twitter/the-algorithm/tree/main/src/java/...
by vonwoodson on 4/1/23, 9:12 PM
Folks talk about media bias: Twitter popularity is a media bias. It’s the most lazy journalism to be able to write a “news” article about what Kim, or Don, or Elon’s PR team tweeted. But, as far as “social” this media is: Twitter is a one-way street. There’s no one actually responding or interacting with Tweets. It’s just a comment section to flame bait.
Maybe we’ll all get lucky and Elon will cause Twitter to go away forever.

by amq on 4/1/23, 7:50 PM

Surprised no one mentioned this:

    s.SpaceSafetyLabelType.MedicalMisinfo -> MedicalMisinfo,
    s.SpaceSafetyLabelType.GenericMisinfo -> GenericMisinfo,
    s.SpaceSafetyLabelType.DmcaWithheld -> DmcaWithheld,
    s.SpaceSafetyLabelType.HatefulHighRecall -> HatefulHighRecall,
    ...
    s.SpaceSafetyLabelType.UkraineCrisisTopic -> UkraineCrisisTopic,

https://github.com/twitter/the-algorithm/blob/ec83d01dcaebf3...

by belter on 4/1/23, 10:29 PM
Unless a trusted third party, forensically audits Twitter, there is no guarantee the published code corresponds to the actual live code in Production. Also multiple parts are not present as stated in the blog.
This should be seen as a possible snapshot of some code, that might have run, might run in the future, or is possibly running in some parts of the production infrastructure at Twitter.
by frob on 3/31/23, 9:15 PM
Well that was a giant nothing-burger. This seems to be your standard ranking stack. We find candidates based on who you follow, who they follow, who is trending, and what we think you like. We then rank them based on how likely you are to engage with them and continue to come back and give us money via our subscription service and ad views. We then try to remove spam and other negative experiences.
Where's the beef?
by Reason077 on 4/1/23, 2:26 AM
One flaw I've noticed in Twitter's recommendations recently is the tendency to send notifications for "BREAKING NEWS"-type Tweets. Great, except they're usually for news that happened in the past - typically 12-24 hours ago!
The algorithm really needs to recognise when tweets are time-sensitive and not recommend them just because they got a lot of engagement the previous day!
by pram on 3/31/23, 7:08 PM
I wonder what determines 'cred' for this part:
https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...
by beebmam on 3/31/23, 7:54 PM
I don't use Twitter, but this is awesome. I hope this will help more people realize how complex it is to build and operate web services.
by kossTKR on 3/31/23, 10:34 PM
I've pretty much ignored all of the superficial political theatre but noticed the actual algo worsening over the last 6 months.
I get way to much random crap now, promoted tweets, "thing that might interest me", users that seem to never get on my feed etc.
Twitter seems to go in the direction of all other social media, feeds that are 100% digital crack with no way to control your media diet.
by HAL3000 on 3/31/23, 10:18 PM
Expect to see A LOT more spam on Twitter after this release. It's like giving SEO spammers access to google search ranking algorithm.
by muratsu on 3/31/23, 7:28 PM
Given the complex relationship between advertisers, platform, and users I don't know if any meaningful contribution can be made to the algorithm without pissing anyone off. The following tab already gave people who're not interested in algo recommendations a way out. I don't quite understand the reasoning behind open sourcing the algorithm. Any thoughts?
by WhereIsTheTruth on 3/31/23, 7:28 PM
Is it even what they use in production?
There is code that favor Elon's tweets so I'd yes that's probably what they use
by anshumankmr on 4/1/23, 1:35 AM
Oh god... The MR's opened today are the craziest ones ever. https://github.com/twitter/the-algorithm/pulls?q=is%3Apr+is%...
They made my morning
by ouraf on 4/2/23, 7:43 PM
Honestly, there's too much garbage in the code dump they made.
Maybe an UML graph or even a presentation or written guide on how they measure and apply each weigh or group policy would make it easier to have some solid take on how it works
by perceptronas on 4/1/23, 5:09 AM
It seems most of the code in the repository is just simple Scala. Codebase is easy to read and understand.
I don't see any Typelevel stuff. This probably lets them hire and train engineers faster while still gaining most of the benefits
I hope this will encourage more companies to pick Scala.
by 13years on 4/2/23, 3:05 PM
A feature proposal to put you in control of the algorithm
https://github.com/twitter/the-algorithm/issues/1363
by jerrygoyal on 4/1/23, 5:40 AM
> The goal of our open source endeavor is to provide full transparency to you, our users, about how our systems work
the majority of users didn't ask for the this so not sure what's the exact motive behind thier efforts. it could be a PR stunt.
by wslh on 4/1/23, 8:28 PM
It's the data, stupid [1] (not the algorithm).
[1] https://en.wikipedia.org/wiki/It%27s_the_economy,_stupid
by evntdrvn on 3/31/23, 7:31 PM
it would be super interesting if when logged in to Twitter, you could take a look at your current calculated scores/weights for all the params that are part of these algorithms. Similar to the Netflix "Stats for nerds" menu...
by WA on 3/31/23, 9:04 PM
Will this make it easier to game the algo or does it depend so heavily on individual user interaction that it’s close to impossible to game it? For example, by carefully crafting Tweets or by buying likes/retweets etc?
by tcmart14 on 3/31/23, 10:47 PM
Repo has 1.5% rust code and no
```
  author_is_uwu
```
That is the biggest problem.
by rvz on 3/31/23, 7:47 PM
If Twitter was 'dead' why on earth are we still talking so much about this blue bird site?
It looks like once again these lot predicting that he won't open source the algorithm and are going to start eating their words again [0], just like they did around incorrectly predicting Twitter's immediate collapse [1] and will look at the source code anyway and continue to talk about "Twitter" again.
If Twitter can open-source their algorithm, Why not TikTok? Either way, the bots are now going to have a very expensive time on Twitter.
[0] https://news.ycombinator.com/item?id=35213213
[1] https://news.ycombinator.com/item?id=33701371
by woolion on 4/2/23, 1:52 PM
So, the day after the headline that Twitter is artificially promoting polarizing political voices, Twitter open-sources their algorithm!
What does the commit history say? There are 3 commits, like a very very real programming project. The issues and pull requests show how much people are fooled by this very transparent move.
So this is an obvious attempt at a digital potemkin village, that like the real one, poorly succeeds in hiding the truth. Elon does not not want to upset the apple cart (political economical or ideological) but make his followers believe in it, and so we get this. Great spectacle, if that's what you're interested in.
by hotpathdev on 4/1/23, 6:38 AM
The issue tracker and pull requests are being hit with very funny suggestions. Many people suspect this is an April Fools joke. It's possible this entire repo was generated by a LLM to appear plausible.
I especially like the suggestions to rewrite the algorithm in Rust [1] and this pull request which simplifies the algorithm to a single c file [2].
[1] https://github.com/twitter/the-algorithm/issues?q=is%3Aissue... [2] https://github.com/twitter/the-algorithm/pull/712
by say_it_as_it_is on 4/1/23, 2:16 PM
And yet they require their software engineer applicants to be well versed in algorithms and data structures? These tech company managers know nothing about how the sausage is made.
by pictur on 3/31/23, 9:46 PM
It's a really scary codebase. Do you really need that much code for the world's crappiest recommendation algorithm? I think you can do more crap with less code. we trust you elon.
by mkl95 on 3/31/23, 10:32 PM
I couldn't care less about Twitter's high level abstractions. They were never renowned for those. Their database schemas and infrastructure on the other hand...
by javajosh on 3/31/23, 9:30 PM
Is there demand for a service that simply shows you the things the people you follow wrote? (It would be up to you not follow so many people that you can't keep up.)
by Reptur on 3/31/23, 7:47 PM
They didn't open source the data the censoring abusive, toxicity, and nsfw the algorithms check against, so I'd call it a partial open-sourcing.
by cwkoss on 3/31/23, 7:24 PM
The twitter algorithm sucks balls and heavily overweights who's paid for a checkmark.
The default feed view has grown increasingly useless over the past ~6 months.
by pledess on 3/31/23, 7:43 PM
there may be a hint of which elections were of interest:
https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...
by jml2 on 3/31/23, 9:03 PM
( "has_toxicity_score_above_threshold", _.getOrElse(EarlybirdFeature, None).exists(_.toxicityScore.exists(_ > 0.91)) )
by infamouscow on 4/1/23, 3:27 AM
I'm glad to see this is licensed AGPL. I hope this sets a precedent for everyone else in the space to do the same.
by abdnafees on 4/1/23, 2:52 PM
I think it's April fools. It's a joke at the expense of open source and should be taken down ASAP.
by jeffbee on 3/31/23, 7:20 PM
Why does anyone use "for you"?
by whalesalad on 3/31/23, 7:22 PM
Two space indent in .py? Provocative.
by m1117 on 3/31/23, 7:20 PM
As I understand, they open sourced only the abstraction, but still have a way to control anything.
by capableweb on 3/31/23, 7:02 PM
I'm no fan of either Twitter nor Elon Musk, but this is a great move and I hope other companies follow what Twitter did here and start open sourcing more core parts like this. Maybe it's mostly useful for learning how it works, not for directly using it in your own product, but the amount of transparency it gives users cannot be understated. As long as that actually is the code they run, but there would be no way for anyone but Twitter to verify that.
by firstSpeaker on 3/31/23, 8:30 PM
Would it be developed in open as well or there will be frequent merge from their internal repos?
by dools on 4/1/23, 12:41 AM
And yet my Twitter feed was always so boring.
Reminds me of the Sirius Cybernetics Nutri-matic drinks machine.
by systemvoltage on 3/31/23, 7:14 PM
Astounding amount of cynicism here, so I'll say something positive: Transparency is undoubtly important, I'm glad we can see how all of this works and what sort of effort goes into building a social media system. It's licensed under GPL which is a bummer (would have preferred BSD) but it's better than nothing.
by bastardoperator on 3/31/23, 9:05 PM
My favorite is ci/ci.sh
```
  #!/bin/sh

  exit 0
```
by kilianinbox on 4/1/23, 10:18 AM
Summary this far • Code from Twitter's algorithm GitHub repository shared • Algorithm checks for specific author types (e.g., Elon Musk, power users, Democrats, Republicans) • Author ID lists used for metrics collection in A/B experimentation platform • Metrics tracked in A/B tests to avoid negative impacts on specific groups • VIPs like Musk, LeBron James, AOC used as indicators for algorithm's behavior • Algorithm changes that negatively affect Musk unlikely to go live • Speculation about code changes pre- and post-Elon's purchase of Twitter • Discussion on the importance of measuring and testing for potential biases • Debate on moral decisions in the context of Twitter's algorithm and content moderation
by inparen on 3/31/23, 7:31 PM
Issue list is growing rapidly for a repo created an hour ago.
by bluelightning2k on 4/1/23, 8:18 AM
Late to the party here so unlikely anyone sees this comment. But the double take for me was seeing the article end with "if this sounds interesting to you, come join us!"
by benatkin on 3/31/23, 7:09 PM
Party in the issues: https://github.com/twitter/the-algorithm/issues
by paulddraper on 3/31/23, 9:20 PM
> 1.4k forks
Wow, we're getting some collaboration going!
by Thaxll on 3/31/23, 6:59 PM
Let's dig into Twitter code quality.
by throwaway689236 on 4/2/23, 8:38 PM
It's better than nothing.
by drakonka on 4/1/23, 8:05 AM
Is this not an April Fools joke?
by diebeforei485 on 3/31/23, 9:05 PM
Kudos for open-sourcing this.
by voz_ on 3/31/23, 9:37 PM
hmmm https://github.com/search?q=repo%3Atwitter%2Fthe-algorithm-m...
Twitter hmu if you need help trying Pytorch 2.0 ;)
by elashri on 3/31/23, 8:21 PM
I wonder if it will be possible in one day to know what is values of `author_is_power_user`, `author_is_democrat` and `author_is_republican` for your account. Does GDPR help with that? probably not because maybe they do it for people inside the us only so it is not related to EU anyway.
by bilekas on 3/31/23, 7:43 PM
I'm supposed to be going out in 20 mins....
by throwayyy479087 on 3/31/23, 6:50 PM
You gotta hand it to Elon - he actually did it.
by distrill on 3/31/23, 7:07 PM
the-algorithm is such a pretentious name for a repo
by anoncow on 4/1/23, 8:51 AM
This is the latest comment.
by Patrickmi on 3/31/23, 8:27 PM
Didn’t Elon check the codebase before open sourcing it, like was he expecting everyone to be happy when seeing author_is_elon ?
by ericzawo on 4/1/23, 2:24 AM
It's really dismaying watching the space man light this website on fire.
https://twitter.com/alexblechman/status/1641905502043926530?...
by photochemsyn on 3/31/23, 7:49 PM
I generally have a very low opinion of social media platforms, but I did create a Twitter account for the first time after Musk bought the platform.
My conclusion is that it's basically entertainment, with very little of what I'd call high-quality useful information that deserves further examination (unlike a lot of HN posts, in contrast). I also notice something of a Tik-Tok approach to video being implemented, which is not surprising given Tik-Tok's success (and makes one wonder who exactly it is lobbying so hard for a Tik-Tok ban, and whether it's just a commercial competition issue more than anything else).
As far as the recommendation algorithm, it appears to be a siloing setup - look at content of one particular flavor, it gives you more of that flavor. A 'flush settings' or 'forget browsing history' or 'reset to defaults' button would be useful, if probably not what advertisers want in terms of delivering to target audiences. I suppose setting up multiple accounts is something of a solution, although too much effort to be that interesting.
In terms of news reports, it's broader in scope than traditional corporate media outlets, so that's a plus in its favor. Reliability is perhaps similar (i.e. low).