On the 31st March 2023 Twitter open-sourced large sections of it’s algorithm which ranks content. This provided refreshing insights into how large scale social media applications rank content.
In this article I’m going to look at how the algorithm works and how content creators can use this new information to gain more reach and influence.
Twitter Algorithm Analysis
Two code repositories were open-sourced:
The Twitter recommendation algorithm is responsible for selecting the most relevant tweets for a user’s timeline. The recommendation system is composed of many interconnected services that extract latent information from tweet, user, and engagement data.
The Home Mixer is the service responsible for constructing and serving the most relevant tweets to a users “For You” timeline. This recommendation pipeline is made up of three stages:
- Candidate sourcing fetches 1500 potential tweets from different recommendation sources including the people you follow and relevant out-of-network sources
- Heavy Ranker is the logistic regression machine learning model that ranks each tweets relevancy and engagement to calculate a timeline worthiness score.
- Heuristics & Filtering applies filters and manual adjustments to the machine learning model to make the recommendation engine more palatable
Candidate sourcing picks out 1500 recent and relevant tweets for a user from a pool of hundreds of millions of tweets. Twitter uses two main sources for candidate sourcing and a users “For You” timeline is generally sourced from 50% in-network and 50% out-of-network.
The In-Network source retrieves relevant tweets from users you follow. Twitter efficiently ranks tweets of those you follow based on their relevance using a logistic regression model. The most important component is Real Graph, a model that predicts the likelihood of engagement between two users. The higher the Real Graph score between you and the author of the tweet, the more of their tweets will be included.
The Out-of-Network source retrieves relevant tweets from users you don’t follow. An attempt is made to estimate relevance by analysing the engagements of people you follow or those with similar interests. A second method is to use embedding space approaches to generate numerical representations of users’ interests and tweets’ content. Twitter can then calculate the similarity between any two users, tweets, or user-tweet pairs in this embedding space.
One of Twitter’s most useful embedding spaces is SimClusters. SimClusters discover communities anchored by a cluster of influential users using a custom matrix factorization algorithm. Users and tweets are represented in the space of communities, and can belong to multiple communities.
The Heavy Ranker is a large neural network-based algorithm with approximately 48 million parameters that is used to score the candidate tweets that have passed through the candidate sourcing process. The goal of the Heavy Ranker is to predict the relevance and engagement of each candidate tweet based on thousands of features, and optimize for positive engagement, such as Likes, Retweets, and Replies.
The Heavy Ranker input consumes many different numerical data sources include user engagement history, content similarity, recency etc. Interestingly there’s links in the source code to Facebook’s: https://github.com/facebookresearch/PyTorch-BigGraph/
The Tweets are ranked based on the scores, and the top Tweets are selected for display on the user’s For You timeline. The Heavy Ranker is continuously trained on Tweet interactions to ensure that it is optimizing for positive engagement and providing the most relevant Tweets to users.
Heuristics & Filtering
The following filters are used to manually adjust the output of the heavy ranker.
- Visibility Filtering This filter removes Tweets from accounts you have blocked or muted. It’s a way to prevent unwanted or harmful content from appearing in your timeline.
- Author Diversity This filter ensures that you don’t see too many consecutive Tweets from a single author. It’s a way to provide a diverse range of content on your timeline.
- Content Balance This filter ensures that you see a fair balance of In-Network and Out-of-Network Tweets. In-Network Tweets are from people you follow, while Out-of-Network Tweets are from people you don’t follow.
- Feedback-based Fatigue This filter lowers the score of certain Tweets if you have provided negative feedback around it. It’s a way to prevent you from seeing Tweets that you may not be interested in or find annoying.
- Social Proof This filter ensures that you only see Out-of-Network Tweets that have been engaged with by someone you follow or who follows the Tweet’s author. It’s a way to provide quality control and ensure that you’re seeing content that has some level of social proof.
- Conversations This filter threads together a Reply with the original Tweet to provide more context. It’s a way to make it easier to follow conversations and understand the context of a particular Tweet.
- Edited Tweets This filter determines if the Tweets currently on your device are stale, and sends instructions to replace them with the edited versions. This is a way to ensure that you’re always seeing the most up-to-date content.
Twitter Ranking Metrics
The earlybird relevancy ranking parameters offer some perspective in to how tweets are ranked within the heavy ranker.
Some interesting tl;dr’s
- Likes are more important than retweets
- English language tweets are given priority
- images and videos are good, external links bad
I’ve broken down these ranking metrics, ordered them by ranking weight and theorised what they are based on given the available information.
- favCountParams (weight 30) counts the number of times a tweet has been liked.
- retweetCountParams (weight 20) counts the number of times a tweet has been retweeted.
- inDirectFollowBoost (weight 4) tweets from users who are followed indirectly by the viewer. i.e. if you follow someone who follows the writer it gets boosted.
- inTrustedCircleBoost (weight 3) tweets from users who are in the viewer’s trusted circle according to their social graph.
- urlParams (weight 2) feature that counts the number of URLs included in the tweet which negatively impact the score as it directs users away from the social media application. Twitter wants to keep users on the app so external links are negatively weighted.
- luceneScoreParams (weight 2) It’s unclear what lucene means. It also might not be live as further down the code there is “getLuceneScore = false” in the options.
- selfTweetBoost A parameter that assigns a weight of 2 to tweets that perhaps reply to your own tweet such as threads.
- tweetHasImageUrlBoost (weight 2) to tweets that contain at least one image URL.
- tweetHasVideoUrlBoost (weight 2) tweets that contain at least one video URL.
- tweetHasTrendBoost (weight 1.1) tweets that contain a trending topic.
- isReplyParams A parameter that assigns a weight of 1 to the feature that indicates whether the tweet is a reply to another tweet.
- replyCountParams (weight 1) feature that counts the number of times a tweet has been replied to.
- multipleHashtagsOrTrendsBoost (weight 0.6) tweets that contain multiple hashtags or trends. This is likely a negative factor to prevent people spamming hash tags.
- langEnglishUIBoost (weight 0.5) tweets written in English, as determined by the user interface language.
- reputationParams (weight 0.2) feature that measures the reputation of the user who tweeted.
- langEnglishTweetBoost (weight 0.2) perhaps prioritising tweets written in English, as determined by the language of the user or the tweet itself.
- textScoreParams (weight 0.18) feature that measures the relevance of the tweet’s text.
- offensiveBoost (weight 0.1) tweets that have been marked as offensive by users. I am surprised this doesn’t have a higher weighting but perhaps it is filtered more efficiently in the heuristics & filtering module.
- unknownLanguageBoost (weight 0.05) tweets written in an unknown language.
- langDefaultBoost (weight 0.02) to tweets written in a language other than English.
There are also some visibility filters here which provide insights into what gets filtered out and manually adjusted for.
- DoNotAmplify Filter out content that may amplify harmful or misleading information.
- CoordinatedHarmfulActivityHighRecall Filter out content that is part of a coordinated effort to spread harmful information, with a focus on high recall (i.e., minimizing false negatives).
- UntrustedUrl Filter out Tweets that contain URLs from untrusted sources.
- MisleadingHighRecall Filter out Tweets that are classified as misleading with a focus on high recall (i.e., minimizing false negatives).
- NsfwHighPrecision Filter out content that is not safe for work (NSFW) with a focus on high precision (i.e., minimizing false positives).
- NsfwHighRecall Filter out content that is not safe for work (NSFW) with a focus on high recall (i.e., minimizing false negatives).
- CivicIntegrityMisinfo Filter out content that contains misinformation related to civic integrity (e.g., election fraud, voter suppression).
- MedicalMisinfo Filter out content that contains misinformation related to medical topics (e.g., false health claims, fake cures).
- GenericMisinfo Filter out content that contains generic misinformation (i.e., not related to a specific topic).
- DmcaWithheld Filter out content that has been withheld due to a Digital Millennium Copyright Act (DMCA) takedown request.
- HatefulHighRecall Filter out content that is classified as hateful with a focus on high recall (i.e., minimizing false negatives).
- ViolenceHighRecall Filter out content that contains violent material with a focus on high recall (i.e., minimizing false negatives).
- HighToxicityModelScore Filter out content that has a high toxicity score according to a machine learning model.
- UkraineCrisisTopic Filter out content that is related to the Russian invasion in Ukraine. The ethical grounds of filtering out this content is highly questionable and concerning but it’s clearly prevalent in Western social media applications.
- DoNotPublicPublish Content tagged as “Do not publish the content publicly” (i.e., keep it private or internal to the system).
There is also this code which seems to rank tweets by age of the writer relative to age of the reader. This suggests that your timeline is likely to be made up of tweets that are from writers that are of a similar age to yourself.
Content Creator Takeaways
- Engagement with other users in your existing network will “connect” you in terms of the in-network real graph and increase the likelihood of your future tweets showing up on their timelines and theirs on yours.
- Likes and retweets are key and they get a very high ranking boost. Twitter is optimizing for engagement and by liking a tweet you reinforce and vote for that writers content.
- Images and video URLs are beneficial ranking factors, rich media content will be prioritised. External links which draw users off-site and excessive use of hash tags are negative ranking factors.
- Readers are more likely to see tweets of people followed by people they follow. This makes it extremely valuable to get followed by popular accounts because you are then more likely to get into their followers “For You” timeline through the second tier connection.
- It’s been announced that from the 15th April the “For You” timeline will be made up of only twitter blue verified members. This seems questionable as users are still going to want to see tweets of unverified writers they follow. Twitter blue writers already get a healthy boost in ranking but this is going to become more of a factor. If you are creating content and trying to build an audience I believe paying the monthly fee will become highly valuable.
I hope this analysis of the Twitter algorithm code has been of interest and has provided some helpful insights into how the social media app ranks content.