
500,000,000 - the number of posts made on X in ONE day.
1,500 - the number of posts that you are shown on the feed.
Talk about picking a needle from a haystack; that’s a 0.0003% filtering rate.
X, like all Social Media Intermediaries (SMIs), earns money from advertising. Advertisers, however, invest in X only when the company guarantees extended user engagement for their ads.
And that is the algorithm’s job: To keep you engaged as long as it can.
And for that, it has to show content ‘relevant’ to you.
The ‘Following’ tab is simpler: a chronological arrangement of posts from accounts you follow.
The ‘For you’ tab: The algorithm chooses posts that you’d probably like to see.
And that probability is the crux of the entire algorithm.

Objective: find 1500 tweets that the user might be interested in.
A Quick 5-min Tour: Demystifying the Complex in a Few Bites
Nerdy Nirvana: A Deep-Dive Into The Algorithm
Actionable Intelligence (Takeaways)
Scenario: Alex is a user on X. He follows a handful of users and is, in turn, followed by a few other users on X. He interacts with content by liking the posts, reposting them, replying and bookmarking his favourites.
Alex also comes across profiles & content that he doesn’t like or even finds annoying and misleading. He mutes them, unfollows some, clicks the “show fewer from” option, or even blocks them.
Problem Statement: Alex goes to the “For you” tab and needs to be shown 1500 posts that he might be interested in.
Now X has a fair understanding of Alex through his:
The key areas where much of the work happens is in stage 2 & stage 3, i.e., Feature Formation and Ranking
Objective: How can you, as a user hack these two stages in order to maximize reach?
For that, we’ll look into these 2 stages with more diligence considering two users viz., you, a post author who likes to maximize reach, and Alex, a random user on X.
Essentially, what happens here is:
Based on Alex's {favorites, reposts, quotes, followings, bookmarks, clicks, unfollows, spam reports, blocks, etc., } ------> Alex's relationship with other [users] and [posts] on X is formed (as a graphical representation)
Imagine you are a post’s author. Your intention here should be that both your profile and your posts have a strong relationship with Alex (in fact, not only Alex but with everyone on X) so that your posts are shown more often to him (and also to other users on X).
In this pursuit, your posts & profile is given a preference in this order:
[ >1 = advantage and <1 = disadvantage]For a detailed metric along with relevant code snippets, refer to the detailed version.
From the relationship so formed, the top 1,500 posts are picked.
In this,
For each of the 1,500 posts, based on its current {favorites, reposts, quotes, bookmarks, clicks, time spent on them, links, image & video, and even correctness of spellings} ------> 10 probabilities (for eg: the probability that Alex will repost it; the probability that Alex will comment on it; the probability that the post author will reply to Alex's comment, etc) are formed with each having a value between 0 & 1.
Notice here that the same “Data” is used to arrive at the probability values. However, X has not open-sourced how it arrives at these values depending on the data.
But the twist here is that all the 10 probabilities are not treated equally. Each carries a different weight.
[ +ve value = advantage and -ve value = disadvantage]Illustration: Let us assume you have made a post ‘ABC’ on X and the algorithm has assigned a probability value of 0.5 for the first 8 scenarios and a value of 0.001 for scenarios 9 and 10 based on {your, ABCs & Alex}’s Data.
💡 Remember that ABC’s Data and your Data is also used apart from Alex’s; so even if Alex had not reacted negatively to your profile or posts in the past, if your profile or ABC had been reported negatively by other users so far, it will negatively impact your chances of getting recommended to Alex as well.
Now, the score for the post ‘ABC’ will be:
(0.5 * 0.5) + (1 * 0.5) + (13.5 * 0.5) + (12 * 0.5) + (0.005 * 0.5) + (75 * 0.5) + (11 * 0.5) + (10 * 0.5) + (-74 * 0.001) + (-369 * 0.001) = 61.0595For a detailed metric along with relevant code snippets, refer to the detailed version.
Based on the Feature Formation and ranking, a sorted list of 1,500 posts is made. They pass the filtering & heuristics and mixing stage before being finally displayed on Alex’s “For you” timeline.
Problem Statement: Alex & Bella are two users. They might be friends, relatives, neighbours, or just two strangers from different continents.
How will X determine what their relationship is? And what is the relationship between them and their tweets?
Further, what is their relationship with other users and posts on X?
Before diving into the algorithm, it is vital to have an understanding of the following:
For the purpose of the above question, X uses multiple features (categorized into 3 headers).
The data inputs such as Alex’s & Bella’s:
| Tweet Engagement | Social graph | User Data |
| Likes/Favourites | Following | Blocks |
| Reposts | Circles | Unfollows |
| Replies | Mutes | |
| Clicks | Spam Reports | |
| Profile Clicks | Abuse Reports | |
| Picture/Video | Geolocation |
And many more are taken into consideration.
💡 There are 1000s of features on X. For a detailed list refer: Features.md
💡 The nodes in the above example, Alex & Bella, are users. In reality, they can be anything - from users, posts, clusters, media, and other things - making up billions of nodes.
X has a gamut of packages (programs/bots). The most essential ones worth knowing are:
| Package Name | Purpose | Source |
| RealGraph | captures the relationship between users | RealGraph |
| GraphJet | captures the relationship between users and posts | GraphJet |
| TwHIN | captures the relationship between users, posts, and other entities | TwHIN |
| SimClusters | groups users & tweets into a ‘basket’ (similar clusters) and captures the relationship between clusters | SimClusters |
| HeavyRanker | gives a final ranking for the posts | HeavyRanker |
| Trust-and-safety-models | filters abusive and NSFW content | Trust-and-safety-models |
| Visibility-filters | filters content based on legal compliance, blocked accounts, etc | Visibility-filters |
| Home-mixer | mixes posts with ads and follow-recommendations to be displayed on the feed | Home-mixer |
💡 Bonus: these packages are also used by X for other functionalities such as returning search results, follow-recommendation-service, etc.
Four key packages viz., RealGraph, GraphJet, TwHIN, and SimClusters get into work here and do the following:
1- Creates a graph between users.
Here, the users are nodes and the edges that connect them are the interactions (can be a like, a repost, etc.), and each edge is directed (A→B is different from B→A) and carries a weight (indicating relationship strength)

2- Creates a graph between users and posts

3- Associates users and posts into one cluster and creates a graph of clusters

To give a picture of how the features are used to create a graph, let us look at the Representation-Scorer package. It provides a scoring system for SimClusters
👉 For the Representation-Scorer:
{favourites, reposts, followings, shares, replies, posts, etc } = positive signals
{ blocks, mutes, reports, see-fewer } = negative signals

Now that you've understood the core tech behind it, let’s put them to use. X breaks down the whole process into three sequential jobs.
Trust-and-Safety works in parallel with all of the above.
Now that X has fetched 1,500 posts to be displayed, here comes the next challenge - in what order they are to be displayed to the user. X’s blog reads:
The ranking mechanism takes into account thousands of features and outputs ten labels to give each Tweet a score, where each label represents the probability of an engagement
This is done by the HeavyRanker package and the ranking formula is:
score = sum_i { (weight of engagement i) * (probability of engagement i) }There are 2 values here: weights and probability.
The probability can be between 0 & 1 → It is arrived by studying 1000s of features (refer to Feature section) of
Based on the 1000s of features, the model outputs 10 discreet labels and gives a value between 0 and 1 for each of the labels. The labels are:
💡 PS: The algorithm on how X computes this probability based on the features (i.e., features → probability) has not been open-sourced yet, and is a point of major criticism.
| The probability that the user will | Sentiment | Weight |
| Like the post | Positive | 0.5 |
| Retweet the post | Positive | 1 |
| Reply to the post | Positive | 13.5 |
| Open the post author’s profile and like or reply to a post | Positive | 12 |
| [Video] will watch at least half of the video | Positive | 0.005 |
| Reply to the post and the tweet author will engage with the reply | Positive | 75 |
| Click into the conversation of the post and engage with a reply | Positive | 11 |
| Click into the conversation of the post and stay there for ≥ 2 mins | Positive | 10 |
| Request “show less often”/block/mute the post author | Negative | -74 |
| Report the Tweet | Negative | -369 |
💡 Note: The weights are used to compute probability -i.e., they are assigned before such actions are taken on a post based on likelihood.
Based on the above formula [ Σ ( weight x probability ) ], a score is computed for the 1,500 tweets and ranked.
This is done to enhance product quality.
Ranked Posts + Ads + Follow-recommendations = "For you" feed
The HomeMixer package finally mixes the outcome of the above with Ads and Follower-recommendation-service (one that recommends whom to follow in the feed) placing them in-between tweets, and serves you an exclusively curated “For you”
Based on the above process, there are four levels where you can take advantage as a post author to maximize reach.
While we take a look at each one of them, assume the perspective of a post-author
They are:
We’ve seen that RealGraph, GraphJet, TwHIN, and SimClusters try to make a graph of users, posts, clusters, etc. as nodes and establish a relationship between each node.
You & your posts are some of the many billion nodes on Twitter.
Objective: make your nodes connect with as many nodes as possible and increase the relationship strength (edge weights - 0.56 in the graphing example).
The code for determining this is:
private def getLinearRankingParams: ThriftRankingParams = {
ThriftRankingParams(
`type` = Some(ThriftScoringFunctionType.Linear),
minScore = -1.0e100,
retweetCountParams = Some(ThriftLinearFeatureRankingParams(weight = 20.0)),
replyCountParams = Some(ThriftLinearFeatureRankingParams(weight = 1.0)),
reputationParams = Some(ThriftLinearFeatureRankingParams(weight = 0.2)),
luceneScoreParams = Some(ThriftLinearFeatureRankingParams(weight = 2.0)),
textScoreParams = Some(ThriftLinearFeatureRankingParams(weight = 0.18)),
urlParams = Some(ThriftLinearFeatureRankingParams(weight = 2.0)),
isReplyParams = Some(ThriftLinearFeatureRankingParams(weight = 1.0)),
favCountParams = Some(ThriftLinearFeatureRankingParams(weight = 30.0)),
langEnglishUIBoost = 0.5,
langEnglishTweetBoost = 0.2,
langDefaultBoost = 0.02,
unknownLanguageBoost = 0.05,
offensiveBoost = 0.1,
inTrustedCircleBoost = 3.0,
multipleHashtagsOrTrendsBoost = 0.6,
inDirectFollowBoost = 4.0,
tweetHasTrendBoost = 1.1,
selfTweetBoost = 2.0,
tweetHasImageUrlBoost = 2.0,
tweetHasVideoUrlBoost = 2.0,
useUserLanguageInfo = true,
ageDecayParams = Some(ThriftAgeDecayRankingParams(slope = 0.005, base = 1.0))Source: GitHub
Since the graphing algorithms make use of data under three categories (Tweet Engagement, Follower Graph, and User Data), let us break the code into three parts and look at each one of them:
| Your post | is boosted by |
| to direct followers | 4x |
| to trusted circle | 3x |
NOTE: The TweepCred package impacted the credibility of users based on the followers-to-following ratio. It has since been deprecated.
| If your post | it is boosted |
| gets a Favourite (Like + Boomark) | 30x |
| gets a Repost | 20x |
| has an image | 2x |
| has a video | 2x |
| in line with the current trend | 1.1x |
| gets a reply | 1x |
There are also severe de-boosts:
| If your post | it is boosted (deboosted) by |
| has unknown words/language | 0.05 (-20x) |
| has offensive words | 0.1 (-10x) |
| has multiple hashtags | 0.6 (-1.7x) |
While it collects information from a plethora of features like geolocation to business partner data, the essential ones are:
val allEdgeFeatures: SCollection[Edge] =
getEdgeFeature(SCollection.unionAll(
Seq(blocks, mutes, abuseReports, spamReports, unfollows)))Source: GitHub
Getting blocked, muted, reported as abuse or spam, and being unfollowed hurts (for up to the next 90 days from any of them happening)
Unfollows are not as heavily penalized as the other 4
Also, if you are a Verified (Blue subscribed) user, you get a boost.
object BlueVerifiedAuthorInNetworkMultiplierParam
extends FSBoundedParam [Double] (
name = “home_mixer_blue_verified_author_in_network_multiplier",
default = 4.0,
min = 0.6,
max = 100.0
)
object BlueVerifiedAuthorOutOfNetworkMultiplierParam
extends FSBoundedParam [Double] (
name = “home_mixer_blue_verified_author_out_of_network_multiplier",
default = 2.0,
min = 0.6,
max = 100.0
)Your posts get a minimum of 2x boost across X and for your followers, it gets a 4x boost.
The exact keywords and topics in the filters are dynamic and keep changing over time. Until recently, posts on Ukraine were de-boosted.
Source: GitHub
As former head of Trust & Safety Yael Roth has said:
Mr. Musk empowered my team to move more aggressively to remove hate speech across the platform — censoring more content, not less.
Source: DailyMail
In short, there are 4 filters and you want to avoid each one of them:
pNSFWMedia: Model to detect tweets with NSFW images. This includes adult and porn content.
pNSFWText: Model to detect tweets with NSFW text, adult/sexual topics.
pToxicity: Model to detect toxic tweets. Toxicity includes marginal content like insults and certain types of harassment. Toxic content does not violate Twitter's terms of service.
pAbuse: Model to detect abusive content. This includes violations of Twitter's terms of service, including hate speech, targeted harassment and abusive behavior.Source: GitHub
a- Hacking the feature weight table
💡 not yet open-sourced by X
b- Hacking the probability weight table
The 10 probability scores as discussed above:
| The probability that the user will | Sentiment | Weight |
| Like the post | Positive | 0.5 |
| Retweet the post | Positive | 1 |
| Reply to the post | Positive | 13.5 |
| Open the post author’s profile and like or reply to a post | Positive | 12 |
| [Video] will watch at least half of the video | Positive | 0.005 |
| Reply to the post and the tweet author will engage with the reply | Positive | 75 |
| Click into the conversation of the post and engage with a reply | Positive | 11 |
| Click into the conversation of the post and stay there for ≥ 2 mins | Positive | 10 |
| Request “show less often”/block/mute the post author | Negative | -74 |
| Report the Tweet | Negative | -369 |
This is the key and most definitive aspect of the entire algorithm
scored_tweets_model_weight_fav: 0.5
scored_tweets_model_weight_retweet: 1.0
scored_tweets_model_weight_reply: 13.5
scored_tweets_model_weight_good_profile_click: 12.0
scored_tweets_model_weight_video_playback50: 0.005
scored_tweets_model_weight_reply_engaged_by_author: 75.0
scored_tweets_model_weight_good_click: 11.0
scored_tweets_model_weight_good_click_v2: 10.0
scored_tweets_model_weight_negative_feedback_v2: -74.0
scored_tweets_model_weight_report: -369.0💡 Note: The weights are used to compute probability -i.e., they are assigned before such actions are taken on a post based on likelihood.
a- Timing your Posts:
Older posts are less relevant and are hence shown less often. Posts on X have a half-life of 360 minutes [6 hours] - This means that a post’s relevance will decrease by 50% every 6 hours
struct ThriftAgeDecayRankingParams {
// the rate in which the score of older tweets decreases
1: optional double slope = 6.003
// the age, in minutes, where the age score of a tweet is half of the latest tweet
2: optional double halflife = 360.0
// the minimal age decay score a tweet will have
3: optional double base = 0.6
}(persisted='true')So, the first few engagements (likes, replies, and reposts) are critical. This factor has no hard & fast rule, and hence, experiment and time your posts when your target audiences are awake and are likely to be active on X.
b- Ensure that your posts do not fall in the extremities of the legal spectrum
Experiment with paid promotions and place your posts as Ads on X.
object BlueVerifiedAuthorInNetworkMultiplierParam
extends FSBoundedParam [Double] (
name = “home_mixer_blue_verified_author_in_network_multiplier",
default = 4.0,
min = 0.6,
max = 100.0
)object BlueVerifiedAuthorOutOfNetworkMultiplierParam
extends FSBoundedParam [Double] (
name = “home_mixer_blue_verified_author_out_of_network_multiplier",
default = 2.0,
min = 0.6,
max = 100.0
)multipleHashtagsOrTrendsBoost = 0.6,pNSFWMedia: Model to detect tweets with NSFW images. This includes adult and porn content.
pNSFWText: Model to detect tweets with NSFW text, adult/sexual topics.
pToxicity: Model to detect toxic tweets. Toxicity includes marginal content like insults and certain types of harassment. Toxic content does not violate Twitter's terms of service.
pAbuse: Model to detect abusive content. This includes violations of Twitter's terms of service, including hate speech, targeted harassment and abusive behavior.Offensive words also get deboosted by 10x (i.e., by a factor of 0.1)
offensiveBoost = 0.1,inTrustedCircleBoost = 3.0,// subtractive penalty applied after boosts for out-of-network replies.
120: optional double OutOfNetworkReplyPenalty = 10.0tweetHasImageUrlBoost = 2.0,
tweetHasVideoUrlBoost = 2.0,unknownLanguageBoost = 0.05,"recap.engagement.is_good_clicked_convo_desc_v2": The probability the user will click into the conversation of this Tweet and stay there for at least 2 minutes.
scored_tweets_model_weight_good_click_v2: 10.0"recap.engagement.is_replied_reply_engaged_by_author": The probability the user replies to the Tweet and this reply is engaged by the Tweet author.
scored_tweets_model_weight_reply_engaged_by_author: 75.0"recap.engagement.is_profile_clicked_and_profile_engaged": The probability the user opens the Tweet author profile and Likes or replies to a Tweet.
scored_tweets_model_weight_good_profile_click: 12.0retweetCountParams = Some(ThriftLinearFeatureRankingParams(weight = 20.0)),
replyCountParams = Some(ThriftLinearFeatureRankingParams(weight = 1.0)),
isReplyParams = Some(ThriftLinearFeatureRankingParams(weight = 1.0)),
favCountParams = Some(ThriftLinearFeatureRankingParams(weight = 30.0)),"recap.engagement.is_favorited": The probability the user will favorite the Tweet.
"recap.engagement.is_favorited": 0.5
"recap.engagement.is_good_clicked_convo_desc_favorited_or_replied": The probability the user will click into the conversation of this Tweet and reply or Like a Tweet.
"recap.engagement.is_good_clicked_convo_desc_favorited_or_replied": 11*
"recap.engagement.is_replied": The probability the user replies to the Tweet.
"recap.engagement.is_replied": 27
"recap.engagement.is_retweeted": The probability the user will ReTweet the Tweet.
"recap.engagement.is_retweeted": 1//parameters used by Representation-scorer
1: optional double fav1dLast10Max // max score from last 10 faves in the last 1 day
2: optional double fav1dLast10Avg // avg score from last 10 faves in the last 1 day
3: optional double fav7dLast10Max // max score from last 10 faves in the last 7 days
4: optional double fav7dLast10Avg // avg score from last 10 faves in the last 7 days
5: optional double retweet1dLast10Max // max score from last 10 retweets in the last 1 days
6: optional double retweet1dLast10Avg // avg score from last 10 retweets in the last 1 days
7: optional double retweet7dLast10Max // max score from last 10 retweets in the last 7 days
8: optional double retweet7dLast10Avg // avg score from last 10 retweets in the last 7 days
9: optional double follow7dLast10Max // max score from the last 10 follows in the last 7 days
10: optional double follow7dLast10Avg // avg score from the last 10 follows in the last 7 days
11: optional double follow30dLast10Max // max score from the last 10 follows in the last 30 days
12: optional double follow30dLast10Avg // avg score from the last 10 follows in the last 30 days
13: optional double share1dLast10Max // max score from last 10 shares in the last 1 day
14: optional double share1dLast10Avg // avg score from last 10 shares in the last 1 day
15: optional double share7dLast10Max // max score from last 10 shares in the last 7 days
16: optional double share7dLast10Avg // avg score from last 10 shares in the last 7 days
17: optional double reply1dLast10Max // max score from last 10 replies in the last 1 day
18: optional double reply1dLast10Avg // avg score from last 10 replies in the last 1 day
19: optional double reply7dLast10Max // max score from last 10 replies in the last 7 days
20: optional double reply7dLast10Avg // avg score from last 10 replies in the last 7 daystweetHasTrendBoost = 1.1struct ThriftAgeDecayRankingParams {
// the rate in which the score of older tweets decreases
1: optional double slope = 6.003
// the age, in minutes, where the age score of a tweet is half of the latest tweet
2: optional double halflife = 360.0
// the minimal age decay score a tweet will have
3: optional double base = 0.6
}(persisted='true')Creating posts with unconventional opinions, that run counter to public acceptance may attract significant attention in terms of clicks and replies. However, it can also result in numerous reports, mutes, and unfollows. Avoid getting muted, reported, unfollowed, and blocked.
val allEdgeFeatures: SCollection[Edge] =
getEdgeFeature(SCollection.unionAll(
Seq(blocks, mutes, abuseReports, spamReports, unfollows)))val allEdgeFeatures: SCollection[Edge] =
getEdgeFeature(SCollection.unionAll(Seq(blocks, mutes, abuseReports, spamReports, unfollows)))
val negativeFeatures: SCollection[KeyVal[Long, UserSession]] =
allEdgeFeatures
.keyBy(_.sourceId)
.topByKey(maxDestinationIds)(Ordering.by(_.features.size))
.map {
case (srcId, pqEdges) =>
val topKNeg =
pqEdges.toSeq.flatMap(toRealGraphEdgeFeatures(hasNegativeFeatures))
KeyVal(
srcId,
UserSession(
userId = Some(srcId),
realGraphFeaturesTest =
Some(RealGraphFeaturesTest.V1(RealGraphFeaturesV1(topKNeg)))))//parameters from representation scorer
// 2001 - 3000 Negative Signals
// Block Series
2001: optional double block1dLast10Avg
2002: optional double block1dLast10Max
2003: optional double block7dLast10Avg
2004: optional double block7dLast10Max
2005: optional double block30dLast10Avg
2006: optional double block30dLast10Max
// Mute Series
2101: optional double mute1dLast10Avg
2102: optional double mute1dLast10Max
2103: optional double mute7dLast10Avg
2104: optional double mute7dLast10Max
2105: optional double mute30dLast10Avg
2106: optional double mute30dLast10Max
// Report Series
2201: optional double report1dLast10Avg
2202: optional double report1dLast10Max
2203: optional double report7dLast10Avg
2204: optional double report7dLast10Max
2205: optional double report30dLast10Avg
2206: optional double report30dLast10Max
// Dontlike
2301: optional double dontlike1dLast10Avg
2302: optional double dontlike1dLast10Max
2303: optional double dontlike7dLast10Avg
2304: optional double dontlike7dLast10Max
2305: optional double dontlike30dLast10Avg
2306: optional double dontlike30dLast10Max
// SeeFewer
2401: optional double seeFewer1dLast10Avg
2402: optional double seeFewer1dLast10Max
2403: optional double seeFewer7dLast10Avg
2404: optional double seeFewer7dLast10Max
2405: optional double seeFewer30dLast10Avg
2406: optional double seeFewer30dLast10Max