Back in mid-March, Elon Musk promised to release Twitter’s source code for its recommendation algorithm — and now, he’s delivered on that and more.
Twitter will open source all code used to recommend tweets on March 31st
— Elon Musk (@elonmusk) March 17, 2023
In addition to sharing its code on GitHub, Twitter has also published a short note on why the team has released the data, with more detail on how the algorithm works shared on the platform’s engineering blog.
According to Twitter, the recommendation algorithms work by attempting to “answer important questions…such as, ‘What is the probability you will interact with another user in the future?’ or, ‘What are the communities on Twitter and what are trending Tweets within them?’”
Twitter uses the information it extracts from tweet, user, and engagement data to source tweets, rank them, and filter out content you’re less likely to enjoy.
“Okay,” you’re thinking, “I kind of assumed that. But how does it actually work?” Buckle up — we’re going to get into it.
Twitter calls the mechanism behind the For You timeline the “Home Mixer.” This is the process of sourcing, ranking, and filtering tweets that produces your recommended content.
Twitter starts by pulling tweets from people you follow (In-Network Sources) and people you don’t follow (Out-of-Network Sources).
In-Network tweets are ranked by a model called Real Graph that “predicts the likelihood of engagement between two users.” If Real Graph thinks you’re relatively likely to engage with a tweet’s author (and vice versa), you’ll see more of their tweets in your timeline.
Out-of-Network tweets are a little trickier to source since they require Twitter’s algorithms to make an educated guess about whether you’d find someone’s content engaging, even if you don’t follow them.
Twitter makes these predictions by using a social graph to ask questions like, “What tweets did the people you follow recently engage with?” and “Who likes the same (or similar) tweets as you, and what else have they recently liked?”
Out-of-Network tweets are also sourced by embedding space approaches. They’re used to group you and your content with tweets and users similar to your interests.
One of these embedding spaces, SimClusters, groups users into “communities” or topic categories anchored by influential users. You can belong to multiple communities at once, and they can range in size from niche friend circles to giant global groups.
If a tweet is popular within a particular community, it will be shown to more people in that community.
Once Twitter has pulled ~1,500 potential tweets for your timeline using both In-Network and Out-of-Network sources, it must rank them.
Twitter was a little more mysterious about how, specifically, it ranks tweets, saying:
“Ranking is achieved with a ~48M parameter neural network that is continuously trained on Tweet interactions to optimize for positive engagement (e.g. Likes, Retweets, and Replies). This ranking mechanism takes into account thousands of features and outputs ten labels to give each Tweet a score, where each label represents the probability of an engagement. We rank the Tweets from these scores.”
However, people have already started digging into the code to find out how these signals are weighted.
Twitter algo 101
– Likes 30x
– Retweets 20x
– Twitter Blue 2-4x
– Trusted circle 3x
– Images/videos 2x
– Replies 1x
– URL only
– No text
– Report pic.twitter.com/mrCuGXB2gJ
— Peter Yang (@petergyang) April 1, 2023
Right now, it looks like we were right about one thing: URL-only tweets are downranked, while likes and retweets offer a major visibility boost.
After ranking, Twitter’s algorithms start filtering out content based on things like who you’ve blocked or muted, who you’ve seen a lot of recently, and any Out-of-Network content that hasn’t been engaged with by someone you follow.
After going through the Home Mixer, your recommended content is mixed with things like ads and follow recommendations to create your final timeline.
According to Twitter, the entire process takes about 1.5 seconds to complete and runs 5 billion times per day.