by jonknee on 3/31/23, 6:42 PM with 1185 comments
by jonathanmayer on 3/31/23, 7:07 PM
From a very quick skim of the repositories, this appears to be quite limited transparency. The documentation gives a decent high-level overview of how Tweet recommendation works—no surprises—and the code tracks that roadmap. Those are meaningful positive steps. But the underlying policies and models are almost entirely missing (there are a couple valuable components in [1]). Without those, we can't evaluate the behavior and possible effects of "the algorithm."
by corbulo on 3/31/23, 7:48 PM
by phailhaus on 3/31/23, 7:39 PM
This is the problem with most of social media today. It is a very well known problem in ML [1], but nobody is willing to do anything about it because it's a fundamental UX change. Facebook, Twitter, YouTube, TikTok, they have defined themselves by their recommendation engines.
[1] https://towardsdatascience.com/dangerous-feedback-loops-in-m...
by PenguinRevolver on 3/31/23, 7:02 PM
by jillesvangurp on 4/1/23, 1:54 AM
If Twitter wants to put a stop to the user exodus and save lots of money in the process, here's what they could do:
1) Add an off switch to the for you feed. I'll click it right away and never turn it on again. Stop wasting minutes of CPU time on my behalf. I never asked for it. It doesn't do anything for me that I need or want.
2) Sort by time, filter by hashtag. Twitter used to be about real time information. I don't care about things that happened days or weeks ago. I don't need to see all of it. This is the core feature that made Twitter popular. Mastodon has it and it is absorbing users from Twitter by the millions. It still works. Restore this feature and make it the default.
3) Join the fediverse. That's where a lot of the former hard core users went. They still exist. They still post messages. They still engage with each other. They just don't use Twitter anymore. Allow people to follow mastodon users. Allow mastodon users to follow Twitter users. Not that hard to implement and probably would do wonders for user engagement.
by HellsMaddy on 3/31/23, 8:14 PM
// we only keep unfollows in the past 90 days due to the huge size of this dataset,
// and to prevent permanent "shadow-banning" in the event of accidental unfollows.
// we treat unfollows as less critical than above 4 negative signals, since it deals more with
// interest than health typically, which might change over time.
val unfollows: SCollection[InteractionGraphRawInput] =
GraphUtil
.getSocialGraphFeatures(
readSnapshot(SocialgraphUnfollowsScalaDataset, sc),
FeatureName.NumUnfollows,
endTs)
.filter(_.age < 90)
https://github.com/twitter/the-algorithm/blob/main/src/scala...by tric on 3/31/23, 6:56 PM
(
"author_is_elon",
candidate =>
candidate
.getOrElse(AuthorIdFeature, None).contains(candidate.getOrElse(DDGStatsElonFeature, 0L))),
(
"author_is_power_user",
candidate =>
candidate
.getOrElse(AuthorIdFeature, None)
.exists(candidate.getOrElse(DDGStatsVitsFeature, Set.empty[Long]).contains)),
(
"author_is_democrat",
candidate =>
candidate
.getOrElse(AuthorIdFeature, None)
.exists(candidate.getOrElse(DDGStatsDemocratsFeature, Set.empty[Long]).contains)),
(
"author_is_republican",
candidate =>
candidate
.getOrElse(AuthorIdFeature, None)
.exists(candidate.getOrElse(DDGStatsRepublicansFeature, Set.empty[Long]).contains)),
)
by koolba on 3/31/23, 7:20 PM
https://github.com/twitter/the-algorithm/blob/main/ci/ci.sh
Permalink: https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...
by sillysaurusx on 3/31/23, 7:24 PM
And thank you to everyone at Twitter who helped organize this release. Open sourcing something like this is no small effort.
by tech234a on 3/31/23, 7:19 PM
[1]: https://github.com/twitter/the-algorithm/blob/main/home-mixe...
by summarity on 3/31/23, 6:53 PM
by rogerallen on 3/31/23, 7:05 PM
I have spent significant effort creating a network and there you go choosing to ignore my efforts by putting in 50% of crap-I-don't-want-to-see.
That is why I despise your algorithm.
by anderspitman on 3/31/23, 7:00 PM
by danso on 3/31/23, 7:16 PM
> Today, the For You timeline consists of 50% In-Network Tweets and 50% Out-of-Network Tweets on average, though this may vary from user to user.
It would’ve been interesting to see what changes were made since Musk’s takeover. As someone who followed 5,000+ users, I know I never saw a tweet that wasn’t either from nor retweeted by someone I followed — e.g. I never saw those “[user you follow] liked [someone you don’t follow] tweet”
50%/50% in FYP seems to reflect my experience today — which is much worse, to the point that I’ll regularly switch to viewing by List b/c I miss seeing people who I want to read.
I wonder how much testing and analysis went into deciding on the 50/50 ratio — e.g. how does it impact user engagement and behavior. Because it sounds like an easy round value that you’d land on when thinking “users should be pushed out of their bubbles”
by d_sc on 4/1/23, 10:01 PM
Code: ( "has_gte_10k_favs", _.getOrElse(EarlybirdFeature, None).exists(_.favCountV2.exists(_ >= 1000))),
Should be: ( "has_gte_10k_favs", _.getOrElse(EarlybirdFeature, None).exists(_.favCountV2.exists(_ >= 10000))),
by roddylindsay on 3/31/23, 7:55 PM
For ranking the candidates these predictions are combined into a score by
weighting them:
"recap.engagement.is_favorited": 0.5
"recap.engagement.is_good_clicked_convo_desc_favorited_or_replied": 11* (the
maximum prediction from these two "good click" features is used and weighted by
11, the other prediction is ignored).
"recap.engagement.is_good_clicked_convo_desc_v2": 11*
"recap.engagement.is_negative_feedback_v2": -74
"recap.engagement.is_profile_clicked_and_profile_engaged": 12
"recap.engagement.is_replied": 27
"recap.engagement.is_replied_reply_engaged_by_author": 75
"recap.engagement.is_report_tweet_clicked": -369
"recap.engagement.is_retweeted": 1 "recap.engagement.is_video_playback_50": 0.005
Who set those weights, and why were they chosen?by ryzvonusef on 3/31/23, 8:11 PM
> the main neural network part of @Twitter recsys algo is based on 2021 work of #SinaWeibo - Chinese clone of Twitter
interesting claimby varjag on 3/31/23, 6:58 PM
This does a lot of heavy lifting here.
by simonsarris on 3/31/23, 7:13 PM
by crop_rotation on 3/31/23, 6:55 PM
by jonknee on 3/31/23, 6:52 PM
https://github.com/twitter/the-algorithm-ml/blob/main/projec...
In realgraph you can see some of the things they keep track of, which include what you have in your address book, total time spent "dwelling" and a few other interesting nuggets.
by paxys on 3/31/23, 7:25 PM
> We also took additional steps to ensure that user safety and privacy would be protected, including our decision not to release training data or model weights associated with the Twitter algorithm at this point.
by paxys on 3/31/23, 7:22 PM
> We also took additional steps to ensure that user safety and privacy would be protected, including our decision not to release training data or model weights associated with the Twitter algorithm at this point.
which is a shame.
by etc_passwd on 3/31/23, 7:54 PM
[1]: https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...
by tric on 3/31/23, 6:45 PM
by jongjong on 4/1/23, 3:32 AM
by Me1000 on 3/31/23, 7:44 PM
by sudo_navendu on 4/1/23, 3:59 AM
private def getLinearRankingParams: ThriftRankingParams = { ThriftRankingParams( `type` = Some(ThriftScoringFunctionType.Linear), minScore = -1.0e100, retweetCountParams = Some(ThriftLinearFeatureRankingParams(weight = 20.0)), replyCountParams = Some(ThriftLinearFeatureRankingParams(weight = 1.0)), reputationParams = Some(ThriftLinearFeatureRankingParams(weight = 0.2)), luceneScoreParams = Some(ThriftLinearFeatureRankingParams(weight = 2.0)), textScoreParams = Some(ThriftLinearFeatureRankingParams(weight = 0.18)), urlParams = Some(ThriftLinearFeatureRankingParams(weight = 2.0)), isReplyParams = Some(ThriftLinearFeatureRankingParams(weight = 1.0)), favCountParams = Some(ThriftLinearFeatureRankingParams(weight = 30.0)), langEnglishUIBoost = 0.5, langEnglishTweetBoost = 0.2, langDefaultBoost = 0.02, unknownLanguageBoost = 0.05, offensiveBoost = 0.1, inTrustedCircleBoost = 3.0, multipleHashtagsOrTrendsBoost = 0.6, inDirectFollowBoost = 4.0, tweetHasTrendBoost = 1.1, selfTweetBoost = 2.0, tweetHasImageUrlBoost = 2.0, tweetHasVideoUrlBoost = 2.0, useUserLanguageInfo = true, ageDecayParams = Some(ThriftAgeDecayRankingParams(slope = 0.005, base = 1.0)) ) }
by evantahler on 4/1/23, 1:39 AM
by rco8786 on 3/31/23, 7:25 PM
Well written article, from an engineer's perspective.
by Egoist on 3/31/23, 7:01 PM
by quotemstr on 3/31/23, 7:12 PM
by robopsychology on 3/31/23, 7:27 PM
by Laaas on 3/31/23, 7:27 PM
by sroussey on 3/31/23, 7:03 PM
by dang on 3/31/23, 8:44 PM
by abalaji on 3/31/23, 6:57 PM
by junto on 4/1/23, 8:39 AM
> The pipeline above runs approximately 5 billion times per day and completes in under 1.5 seconds on average. A single pipeline execution requires 220 seconds of CPU time, nearly 150x the latency you perceive on the app.
by sho_hn on 3/31/23, 7:35 PM
by Weidenwalker on 4/2/23, 5:03 PM
Maybe this is helpful to anyone for navigating what's in there!
by froggychairs on 3/31/23, 8:07 PM
by thumbsup-_- on 3/31/23, 8:19 PM
by matesz on 3/31/23, 8:48 PM
by ryanisnan on 3/31/23, 9:36 PM
I want to see a chronological list of things sources I follow have posted.
Yes, I understand you can do this on Twitter still, but I would guess most people are more influenced by "the algorithm".
by stusmall on 3/31/23, 7:59 PM
Like a dig at the code quality.
by bagels on 3/31/23, 10:43 PM
I found it interesting that there is no attribution. Most other companies list the authors on engineering blogs (eg. Facebook, Uber, etc.)
This topic seems to draw the attention of unhinged people, so I suppose I wouldn't want my name on it either.
by NicoJuicy on 4/1/23, 5:31 AM
And i don't even live in the US.
It would explain why they are tracking it, to increase visibility.
by AlbertCory on 3/31/23, 7:04 PM
for Google Ads, you couldn't easily know what ads would be shown for a given query, without a whole lot of data that's not contained in any code: the experiment settings in the server, for one thing. And the user who's doing the query, for another.
An "experiment" could apply to 100% of the traffic, so it's not really an experiment anymore. And even if you think X has been put into production, there is still a "holdback" experiment, where some part of the traffic does not get X applied to it.
by rblion on 4/2/23, 7:55 AM
by oulu2006 on 4/2/23, 12:55 AM
https://www.databreachtoday.com/twitter-says-source-code-lea...
by cmckn on 3/31/23, 8:07 PM
https://github.com/twitter/the-algorithm/tree/main/src/java/...
by vonwoodson on 4/1/23, 9:12 PM
Maybe we’ll all get lucky and Elon will cause Twitter to go away forever.
by amq on 4/1/23, 7:50 PM
s.SpaceSafetyLabelType.MedicalMisinfo -> MedicalMisinfo,
s.SpaceSafetyLabelType.GenericMisinfo -> GenericMisinfo,
s.SpaceSafetyLabelType.DmcaWithheld -> DmcaWithheld,
s.SpaceSafetyLabelType.HatefulHighRecall -> HatefulHighRecall,
...
s.SpaceSafetyLabelType.UkraineCrisisTopic -> UkraineCrisisTopic,
https://github.com/twitter/the-algorithm/blob/ec83d01dcaebf3...by belter on 4/1/23, 10:29 PM
This should be seen as a possible snapshot of some code, that might have run, might run in the future, or is possibly running in some parts of the production infrastructure at Twitter.
by frob on 3/31/23, 9:15 PM
Where's the beef?
by Reason077 on 4/1/23, 2:26 AM
The algorithm really needs to recognise when tweets are time-sensitive and not recommend them just because they got a lot of engagement the previous day!
by pram on 3/31/23, 7:08 PM
https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...
by beebmam on 3/31/23, 7:54 PM
by kossTKR on 3/31/23, 10:34 PM
I get way to much random crap now, promoted tweets, "thing that might interest me", users that seem to never get on my feed etc.
Twitter seems to go in the direction of all other social media, feeds that are 100% digital crack with no way to control your media diet.
by HAL3000 on 3/31/23, 10:18 PM
by muratsu on 3/31/23, 7:28 PM
by WhereIsTheTruth on 3/31/23, 7:28 PM
There is code that favor Elon's tweets so I'd yes that's probably what they use
by anshumankmr on 4/1/23, 1:35 AM
They made my morning
by ouraf on 4/2/23, 7:43 PM
Maybe an UML graph or even a presentation or written guide on how they measure and apply each weigh or group policy would make it easier to have some solid take on how it works
by perceptronas on 4/1/23, 5:09 AM
I don't see any Typelevel stuff. This probably lets them hire and train engineers faster while still gaining most of the benefits
I hope this will encourage more companies to pick Scala.
by 13years on 4/2/23, 3:05 PM
by jerrygoyal on 4/1/23, 5:40 AM
the majority of users didn't ask for the this so not sure what's the exact motive behind thier efforts. it could be a PR stunt.
by wslh on 4/1/23, 8:28 PM
[1] https://en.wikipedia.org/wiki/It%27s_the_economy,_stupid
by evntdrvn on 3/31/23, 7:31 PM
by WA on 3/31/23, 9:04 PM
by tcmart14 on 3/31/23, 10:47 PM
author_is_uwu
That is the biggest problem.by rvz on 3/31/23, 7:47 PM
It looks like once again these lot predicting that he won't open source the algorithm and are going to start eating their words again [0], just like they did around incorrectly predicting Twitter's immediate collapse [1] and will look at the source code anyway and continue to talk about "Twitter" again.
If Twitter can open-source their algorithm, Why not TikTok? Either way, the bots are now going to have a very expensive time on Twitter.
by woolion on 4/2/23, 1:52 PM
What does the commit history say? There are 3 commits, like a very very real programming project. The issues and pull requests show how much people are fooled by this very transparent move.
So this is an obvious attempt at a digital potemkin village, that like the real one, poorly succeeds in hiding the truth. Elon does not not want to upset the apple cart (political economical or ideological) but make his followers believe in it, and so we get this. Great spectacle, if that's what you're interested in.
by hotpathdev on 4/1/23, 6:38 AM
I especially like the suggestions to rewrite the algorithm in Rust [1] and this pull request which simplifies the algorithm to a single c file [2].
[1] https://github.com/twitter/the-algorithm/issues?q=is%3Aissue... [2] https://github.com/twitter/the-algorithm/pull/712
by say_it_as_it_is on 4/1/23, 2:16 PM
by pictur on 3/31/23, 9:46 PM
by mkl95 on 3/31/23, 10:32 PM
by javajosh on 3/31/23, 9:30 PM
by Reptur on 3/31/23, 7:47 PM
by cwkoss on 3/31/23, 7:24 PM
The default feed view has grown increasingly useless over the past ~6 months.
by pledess on 3/31/23, 7:43 PM
https://github.com/twitter/the-algorithm/blob/7f90d0ca342b92...
by jml2 on 3/31/23, 9:03 PM
by infamouscow on 4/1/23, 3:27 AM
by abdnafees on 4/1/23, 2:52 PM
by jeffbee on 3/31/23, 7:20 PM
by whalesalad on 3/31/23, 7:22 PM
by m1117 on 3/31/23, 7:20 PM
by capableweb on 3/31/23, 7:02 PM
by firstSpeaker on 3/31/23, 8:30 PM
by dools on 4/1/23, 12:41 AM
Reminds me of the Sirius Cybernetics Nutri-matic drinks machine.
by systemvoltage on 3/31/23, 7:14 PM
by bastardoperator on 3/31/23, 9:05 PM
#!/bin/sh
exit 0
by kilianinbox on 4/1/23, 10:18 AM
by inparen on 3/31/23, 7:31 PM
by bluelightning2k on 4/1/23, 8:18 AM
by benatkin on 3/31/23, 7:09 PM
by paulddraper on 3/31/23, 9:20 PM
Wow, we're getting some collaboration going!
by Thaxll on 3/31/23, 6:59 PM
by throwaway689236 on 4/2/23, 8:38 PM
by drakonka on 4/1/23, 8:05 AM
by diebeforei485 on 3/31/23, 9:05 PM
by voz_ on 3/31/23, 9:37 PM
Twitter hmu if you need help trying Pytorch 2.0 ;)
by elashri on 3/31/23, 8:21 PM
by bilekas on 3/31/23, 7:43 PM
by throwayyy479087 on 3/31/23, 6:50 PM
by distrill on 3/31/23, 7:07 PM
by anoncow on 4/1/23, 8:51 AM
by Patrickmi on 3/31/23, 8:27 PM
by ericzawo on 4/1/23, 2:24 AM
https://twitter.com/alexblechman/status/1641905502043926530?...
by photochemsyn on 3/31/23, 7:49 PM
My conclusion is that it's basically entertainment, with very little of what I'd call high-quality useful information that deserves further examination (unlike a lot of HN posts, in contrast). I also notice something of a Tik-Tok approach to video being implemented, which is not surprising given Tik-Tok's success (and makes one wonder who exactly it is lobbying so hard for a Tik-Tok ban, and whether it's just a commercial competition issue more than anything else).
As far as the recommendation algorithm, it appears to be a siloing setup - look at content of one particular flavor, it gives you more of that flavor. A 'flush settings' or 'forget browsing history' or 'reset to defaults' button would be useful, if probably not what advertisers want in terms of delivering to target audiences. I suppose setting up multiple accounts is something of a solution, although too much effort to be that interesting.
In terms of news reports, it's broader in scope than traditional corporate media outlets, so that's a plus in its favor. Reliability is perhaps similar (i.e. low).