### Get Help for Deploy Script Source: https://github.com/twitter/the-algorithm/blob/main/simclusters-ann/README.md Display the help message for the `deploy.sh` script, providing information on its usage and available options. This is useful for understanding the script's capabilities and parameters. ```bash #!/bin/bash ./simclusters-ann/bin/deploy.sh --help ``` -------------------------------- ### Feature Store Update Example (Feature Switches) Source: https://github.com/twitter/the-algorithm/blob/main/timelines/data_processing/ml_util/aggregation_framework/docs/real-time.rst Shows an example of a Product Change Management (PCM) entry for managing feature switches when rolling out new features to the feature store. ```Text PCM-148654 ``` -------------------------------- ### Feature Store Update Example (Canarying) Source: https://github.com/twitter/the-algorithm/blob/main/timelines/data_processing/ml_util/aggregation_framework/docs/real-time.rst Demonstrates an example of a Product Change Management (PCM) entry for canarying changes when rolling out new features to the feature store. ```Text PCM-145753 ``` -------------------------------- ### IntelliJ Setup for InteractionGraphLabels Source: https://github.com/twitter/the-algorithm/blob/main/src/scala/com/twitter/interaction_graph/scio/ml/labels/README.md Command to create a new FastPass project for the InteractionGraphLabels Dataflow Job within IntelliJ. ```bash fastpass create --name rg_labels --intellij src/scala/com/twitter/interaction_graph/scio/ml/labels ``` -------------------------------- ### Add Features from Feature Store to RTA (Step 2 Example) Source: https://github.com/twitter/the-algorithm/blob/main/timelines/data_processing/ml_util/aggregation_framework/docs/real-time.rst Provides an example of creating a new ReadableStore that utilizes a Feature Store Client to fetch features. It includes implementing a FeaturesAdapter to derive new features from the raw data obtained from the Feature Store. ```Scala class UserFeaturesReadableStore extends ReadableStore[UserFeatures] with FeaturesAdapter { // ... implementation to read discrete features and convert them ... } ``` -------------------------------- ### IntelliJ Project Setup Source: https://github.com/twitter/the-algorithm/blob/main/src/scala/com/twitter/interaction_graph/scio/agg_notifications/README.md Command to create a new project or module in IntelliJ for the specified Scala path. ```Shell fastpass create --name rg_labels --intellij src/scala/com/twitter/interaction_graph/scio/agg_notifications ``` -------------------------------- ### Setup IntelliJ for InteractionGraphAggDirectInteractions Source: https://github.com/twitter/the-algorithm/blob/main/src/scala/com/twitter/interaction_graph/scio/agg_direct_interactions/README.md This command is used to set up the IntelliJ IDE for the InteractionGraphAggDirectInteractions Scio job. It leverages Bazel to generate the necessary project files for IntelliJ. ```bash ./bazel idea src/scala/com/twitter/interaction_graph/scio/agg_direct_interactions:interaction_graph_agg_direct_interactions_scio ``` -------------------------------- ### Generate Developer Certificates for Deployment Source: https://github.com/twitter/the-algorithm/blob/main/simclusters-ann/README.md Generates service-to-service certificates required for local development deployments to Aurora. This is a one-time setup process. ```bash developer-cert-util --env devel --job simclusters-ann ``` -------------------------------- ### CMake: Project Setup and Source File Discovery Source: https://github.com/twitter/the-algorithm/blob/main/twml/libtwml/src/ops/CMakeLists.txt Initializes CMake, sets module paths, and recursively finds all C++ source files within the project directory. It also configures C++ compiler flags for warnings and C++11 standard compliance. ```cmake set(CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}) cmake_minimum_required(VERSION 2.8 FATAL_ERROR) cmake_policy(VERSION 2.8) set(CMAKE_MACOSX_RPATH 1) file(GLOB_RECURSE sources *.cpp) set (CMAKE_CXX_FLAGS "-Wall -std=c++11 -fno-stack-protector ${CMAKE_CXX_FLAGS}") ``` -------------------------------- ### CMake Project Setup and Library Definition Source: https://github.com/twitter/the-algorithm/blob/main/twml/libtwml/src/lib/CMakeLists.txt Configures the CMake module path, minimum required version, and policy. It then defines the twml library as a static library, compiling all found .cpp files and setting C++11 standards with specific flags. ```cmake set(CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}) cmake_minimum_required(VERSION 2.8 FATAL_ERROR) cmake_policy(VERSION 2.8) set(TWML_VERSION "2.0.0") string(REPLACE "." ";" TWML_VERSION_LIST ${TWML_VERSION}) list(GET TWML_VERSION_LIST 0 TWML_SOVERSION) execute_process( COMMAND $ENV{LIBTWML_HOME}/src/ops/scripts/get_inc.sh RESULT_VARIABLE TF_RES OUTPUT_VARIABLE TF_INC) file(GLOB_RECURSE sources *.cpp) set (CMAKE_CXX_FLAGS "-Wall -std=c++11 ${CMAKE_CXX_FLAGS} -fPIC") add_library(twml STATIC ${sources}) ``` -------------------------------- ### Connect to Cache and Query in Scala REPL Source: https://github.com/twitter/the-algorithm/blob/main/timelines/data_processing/ml_util/aggregation_framework/docs/real-time.rst This Scala snippet provides the necessary imports and setup for connecting to a cache and querying it within a Scala REPL. It includes establishing a SOCKS proxy connection and configuring JVM options for the REPL. ```scala import com.twitter.conversions.DurationOps._ ``` -------------------------------- ### Setup InteractionGraphLabels Dataflow Job in IntelliJ Source: https://github.com/twitter/the-algorithm/blob/main/src/scala/com/twitter/interaction_graph/scio/ml/scores/README.md This command is used to create or set up the InteractionGraphLabels Dataflow Job within an IntelliJ environment. It specifies the name of the resource group and the path to the Scala source code. ```Shell fastpass create --name rg_scores --intellij src/scala/com/twitter/interaction_graph/scio/ml/scores ``` -------------------------------- ### Create Labeled Training Dataset (SQL) Source: https://github.com/twitter/the-algorithm/blob/main/src/scala/com/twitter/interaction_graph/bqe/training/README.md This BigQuery script joins sampled candidates with interaction labels, assigns labels (1 for positive, 0 for negative), filters out negative interactions, and creates a training table by downsampling negative examples to balance the dataset. ```SQL 1. Defines two variables date_candidates and date_labels as dates based on the $start_time$ parameter. 2. Creates a new table twttr-recos-ml-prod.realgraph.labeled_candidates$table_suffix$ with default values. 3. Deletes any prior data in the twttr-recos-ml-prod.realgraph.labeled_candidates$table_suffix$ table for the current date_candidates. 4. Joins the twttr-recos-ml-prod.realgraph.candidates_sampled table with the twttr-bq-cassowary-prod.user.interaction_graph_labels_daily table and the twttr-bq-cassowary-prod.user.interaction_graph_agg_negative_edge_snapshot table. It assigns a label of 1 for positive interactions and 0 for negative interactions, and selects only the rows where there is no negative interaction. 5. Inserts the joined data into the twttr-recos-ml-prod.realgraph.labeled_candidates$table_suffix$ table. 6. Calculates the positive rate by counting the number of positive labels and dividing it by the total number of labels. 7. Creates a new table twttr-recos-ml-prod.realgraph.train$table_suffix$ by sampling from the twttr-recos-ml-prod.realgraph.labeled_candidates$table_suffix$ table, with a downsampling of negative examples to balance the number of positive and negative examples, based on the positive rate calculated in step 6. The resulting twttr-recos-ml-prod.realgraph.train$table_suffix$ table is used as a training dataset for a machine learning model. ``` -------------------------------- ### Run Adhoc Dataflow Job Source: https://github.com/twitter/the-algorithm/blob/main/src/scala/com/twitter/simclusters_v2/scio/multi_type_graph/assemble_multi_type_graph/README.md Executes an adhoc Dataflow job for assembling a multi-type graph. Requires GCP project setup and specifies job parameters like user name, date, and machine type. ```bash export GCP_PROJECT_NAME='twttr-recos-ml-prod' ./bazel bundle src/scala/com/twitter/simclusters_v2/scio/multi_type_graph/assemble_multi_type_graph:assemble-multi-type-graph-scio-adhoc-app bin/d6w create \ ${GCP_PROJECT_NAME}/us-central1/assemble-multi-type-graph-scio-adhoc-app \ src/scala/com/twitter/simclusters_v2/scio/multi_type_graph/assemble-multi-type-graph-scio-adhoc.d6w \ --jar dist/assemble-multi-type-graph-scio-adho-app.jar \ --bind=profile.project=${GCP_PROJECT_NAME} \ --bind=profile.user_name=${USER} \ --bind=profile.date="2021-11-04" \ --bind=profile.machine="n2-highmem-16" ``` -------------------------------- ### Invoke Initial Job Run via Capesos Source: https://github.com/twitter/the-algorithm/blob/main/timelines/data_processing/ml_util/aggregation_framework/docs/batch.rst Command to invoke the initial run of the user aggregates v2 job using Capesos, specifying the environment, build locally, start cron, and the Capesos YAML file. ```Shell CAPESOSPY_ENV=prod capesospy-v2 update --build_locally --start_cron user_aggregates_v2_initial_run science/scalding/mesos/timelines/prod.yml ``` -------------------------------- ### Hydrate Real-Time Aggregate Features with Memcache Client Source: https://github.com/twitter/the-algorithm/blob/main/timelines/data_processing/ml_util/aggregation_framework/docs/real-time.rst Demonstrates how to build a Memcache client to access real-time aggregate features stored in memcached. It highlights the importance of correct key injection and codec configuration, referencing an example Memcache client builder from Timelines. ```Scala val memcacheClient = RealTimeAggregatesMemcacheBuilder.build(queryKey, codec) ``` -------------------------------- ### Run Recos-Injector Server Tests (Bash) Source: https://github.com/twitter/the-algorithm/blob/main/recos-injector/README.md This snippet shows how to build and run tests for the Recos-Injector project using Bazel commands. ```bash bazel build recos-injector/... bazel test recos-injector/... ``` -------------------------------- ### Run Recos-Injector Server Locally (Bash) Source: https://github.com/twitter/the-algorithm/blob/main/recos-injector/README.md Instructions for compiling and running the Recos-Injector server in development mode locally using Bazel. ```bash bazel build recos-injector/server:bin bazel run recos-injector/server:bin ``` -------------------------------- ### Home Mixer for Timeline Construction Source: https://github.com/twitter/the-algorithm/blob/main/README.md The home-mixer is the main service used to construct and serve the Home Timeline. It is built upon the product-mixer service. ```N/A home-mixer/README.md ``` -------------------------------- ### Generate Recos-Injector Deployment Package (Bash) Source: https://github.com/twitter/the-algorithm/blob/main/recos-injector/README.md Command to create a zip archive of the Recos-Injector server for deployment using Bazel. ```bash bazel bundle recos-injector/server:bin --bundle-jvm-archive=zip ``` -------------------------------- ### Navi: High-Performance ML Model Serving (Rust) Source: https://github.com/twitter/the-algorithm/blob/main/README.md Navi is a high-performance machine learning model serving framework written in Rust. It's designed for low-latency inference of ML models. ```Rust use std::collections::HashMap; struct ModelServer { models: HashMap) -> f64>>, } impl ModelServer { fn new() -> Self { ModelServer { models: HashMap::new() } } fn register_model(&mut self, name: &str, model_fn: F) where F: Fn(Vec) -> f64 + 'static, { self.models.insert(name.to_string(), Box::new(model_fn)); } fn predict(&self, model_name: &str, input: Vec) -> Option { self.models.get(model_name).map(|f| f(input)) } } // Example usage: // let mut server = ModelServer::new(); // server.register_model("simple_linear", |input| input.iter().sum()); // let result = server.predict("simple_linear", vec![1.0, 2.0, 3.0]); ``` -------------------------------- ### Custom Aggregate Operator Example Source: https://github.com/twitter/the-algorithm/blob/main/timelines/data_processing/ml_util/aggregation_framework/docs/aggregation.rst Illustrates the possibility of adding custom aggregate operators, such as LastResetMetric, to the framework with additional implementation effort. ```Java // Assuming LastResetMetric is implemented and available // LastResetMetric lastResetMetric = new LastResetMetric(); ``` -------------------------------- ### Product Mixer: Feed Construction Framework Source: https://github.com/twitter/the-algorithm/blob/main/README.md The product-mixer is a software framework specifically built for constructing content feeds. It allows for flexible composition of different content sources and ranking strategies. ```Scala package com.twitter.product_mixer import com.twitter.util.Future trait FeedBuilder { def buildFeed(userId: Long, surface: String): Future[Seq[ContentItem]] } case class ContentItem(id: String, score: Double, source: String) ``` -------------------------------- ### Run Onnx with Navi Source: https://github.com/twitter/the-algorithm/blob/main/navi/README.md This script demonstrates how to run the Navi ML serving server with Onnx Runtime. It requires a 'models' directory with versioned model subdirectories. ```Shell scripts/run_onnx.sh ``` -------------------------------- ### Update Warm Start Checkpoint Support Source: https://github.com/twitter/the-algorithm/blob/main/pushservice/src/main/python/models/heavy_ranking/README.md Contains support functions to modify checkpoints for a given saved heavy ranker model, facilitating warm-start training. ```Python update_warm_start_checkpoint.py ``` -------------------------------- ### Deploy Twitter Algorithm to Devel Source: https://github.com/twitter/the-algorithm/blob/main/simclusters-ann/README.md Build the service locally, upload it to packer, and deploy it to the devel aurora environment using the `deploy.sh` script. This script handles the build and deployment process for the specified service and environment. ```bash #!/bin/bash ./simclusters-ann/bin/deploy.sh atla $USER devel simclusters-ann ``` -------------------------------- ### Configure Real-Time Aggregation with AggregateGroup (Scala Example) Source: https://github.com/twitter/the-algorithm/blob/main/timelines/data_processing/ml_util/aggregation_framework/docs/real-time.rst Illustrates how to modify or add a new AggregateGroup in TimelinesOnlineAggregationConfigBase.scala to define aggregation keys, features, labels, and metrics for new aggregate features. ```Scala object TimelinesOnlineAggregationConfigBase extends OnlineAggregationConfigBase { // ... define new AggregateGroup here ... } ``` -------------------------------- ### Deploy Twitter Algorithm to Staging Source: https://github.com/twitter/the-algorithm/blob/main/simclusters-ann/README.md Deploy the Twitter algorithm to the staging environment using the `deploy.sh` script. Specify the instance number for the deployment. This allows for testing in a staging environment before production deployment. ```bash #!/bin/bash ./simclusters-ann/bin/deploy.sh atla simclusters-ann staging simclusters-ann ``` -------------------------------- ### Using Labels for Subset Aggregation Source: https://github.com/twitter/the-algorithm/blob/main/timelines/data_processing/ml_util/aggregation_framework/docs/aggregation.rst Demonstrates how labels (e.g., IS_FAVORITED) can be used to restrict aggregation to a subset of records for a given key (e.g., USER_ID). This example counts favorites on tweets with photos. ```Java CountMetric countMetric = new CountMetric(); // Example: Group by USER_ID, feature HAS_PHOTO, label IS_FAVORITED ``` -------------------------------- ### FRS Product Configurations Source: https://github.com/twitter/the-algorithm/blob/main/follow-recommendations-service/README.md Specifies the path to view all products supported by the Follow Recommendations Service (FRS). Each product corresponds to a display location and can utilize one or multiple flows for candidate generation. ```Scala follow-recommendations-service/server/src/main/scala/com/twitter/follow_recommendations/products/home_timeline_tweet_recs ``` -------------------------------- ### Deploy Real Graph Training Job Source: https://github.com/twitter/the-algorithm/blob/main/src/scala/com/twitter/interaction_graph/bqe/training/README.md Instructions for deploying the real graph training job. This involves creating a zip file of the training code, uploading it to packer, and scheduling it using Aurora. ```Shell zip -jr real_graph_training src/scala/com/twitter/interaction_graph/bqe/training && \ packer add_version --cluster=atla cassowary real_graph_training real_graph_training.zip aurora cron schedule atla/cassowary/prod/real_graph_training src/scala/com/twitter/interaction_graph/bqe/training/training.aurora && \ aurora cron start atla/cassowary/prod/real_graph_training ``` -------------------------------- ### Define AggregationKey for Real-Time Features Source: https://github.com/twitter/the-algorithm/blob/main/timelines/data_processing/ml_util/aggregation_framework/docs/real-time.rst Explains how to define an AggregationKey for querying real-time aggregate features from a backing store. It shows an example of instantiating AggregationKey with a featureId and a specific value, like USER_ID. ```Scala val aggregationKey = AggregationKey(featureId = USER_ID, userId = someUserId) ``` -------------------------------- ### Graph Feature Service: Serving Graph Features Source: https://github.com/twitter/the-algorithm/blob/main/README.md The graph-feature-service provides essential graph-based features for pairs of users. For example, it can return how many users followed by User A also liked posts from User B. ```Scala package com.twitter.graph_feature_service import com.twitter.util.Future trait GraphFeatureProvider { def getFeatures(userAId: Long, userBId: Long): Future[Map[String, Any]] } object GraphFeatureService extends GraphFeatureProvider { override def getFeatures(userAId: Long, userBId: Long): Future[Map[String, Any]] = { // Example: Fetching features like 'mutual_follows', 'shared_likes' Future.value(Map("mutual_follows" -> 5, "shared_likes" -> 10)) } } ``` -------------------------------- ### Deploy Twitter Algorithm to Production Source: https://github.com/twitter/the-algorithm/blob/main/simclusters-ann/README.md Deploy the Twitter algorithm to the production environment using the `deploy.sh` script. This is typically reserved for emergencies or testing specific canary shards and requires approval from oncall. Specify instance numbers or a range for deployment. ```bash #!/bin/bash ./simclusters-ann/bin/deploy.sh atla simclusters-ann prod simclusters-ann ./simclusters-ann/bin/deploy.sh atla simclusters-ann prod simclusters-ann - ``` -------------------------------- ### TWML: Legacy TensorFlow v1 ML Framework Source: https://github.com/twitter/the-algorithm/blob/main/README.md TWML is a legacy machine learning framework built upon TensorFlow v1. While older, it may still be used for certain functionalities or historical models. ```Python import tensorflow.compat.v1 as tf tf.disable_v2_behavior() # Example of a simple TensorFlow v1 graph operation def create_tf_graph(): a = tf.constant(5.0) b = tf.constant(6.0) c = a * b return c # To run this graph: # graph_result = create_tf_graph() # with tf.Session() as sess: # print(sess.run(graph_result)) ``` -------------------------------- ### Unified User Actions Stream Source: https://github.com/twitter/the-algorithm/blob/main/README.md This component provides a real-time stream of all user actions performed on the X platform. It's crucial for understanding user behavior and preferences. ```Scala package com.twitter.unified_user_actions import com.twitter.util.Future trait UserActionStream { def publishAction(action: UserAction): Future[Unit] } case class UserAction(userId: Long, actionType: String, timestamp: Long) ``` -------------------------------- ### Build SimClusters ANN Service Source: https://github.com/twitter/the-algorithm/blob/main/simclusters-ann/README.md Compiles the SimClusters ANN service using Bazel. This command is essential for building the executable binary for the service. ```bash ./bazel build simclusters-ann/server:bin ``` -------------------------------- ### Run TensorFlow 2 with Navi Source: https://github.com/twitter/the-algorithm/blob/main/navi/README.md This script demonstrates how to run the Navi ML serving server with TensorFlow 2. It requires a 'models' directory with versioned model subdirectories. ```Shell scripts/run_tf2.sh ``` -------------------------------- ### Pushservice Light Ranker for Pre-selection Source: https://github.com/twitter/the-algorithm/blob/main/README.md The pushservice-light-ranker is a model used by the pushservice to rank posts. It bridges candidate generation and heavy ranking by pre-selecting highly relevant candidates from a large pool. ```Python pushservice/src/main/python/models/light_ranking/README.md ``` -------------------------------- ### Recos-Injector: Streaming Event Processor Source: https://github.com/twitter/the-algorithm/blob/main/README.md This service acts as a streaming event processor, preparing input streams for GraphJet-based services. It ensures data is formatted correctly for downstream recommendation engines. ```Scala package com.twitter.recos_injector import com.twitter.util.Future trait RecosEventProcessor { def processEvent(event: RecommendationEvent): Future[Unit] } case class RecommendationEvent(userId: Long, itemId: String, score: Double) ``` -------------------------------- ### Following Timeline Pipeline Configuration Source: https://github.com/twitter/the-algorithm/blob/main/home-mixer/README.md Outlines the pipeline configuration for the 'Following' timeline. It specifies the candidate fetching pipelines for tweets, conversations, ads, and user recommendations. ```Scala - FollowingProductPipelineConfig - FollowingMixerPipelineConfig - FollowingEarlybirdCandidatePipelineConfig (fetch tweets from Search Index) - ConversationServiceCandidatePipelineConfig (fetch ancestors for conversation modules) - FollowingAdsCandidatePipelineConfig (fetch ads) - FollowingWhoToFollowCandidatePipelineConfig (fetch users to recommend) ``` -------------------------------- ### Light Ranker for Earlybird Source: https://github.com/twitter/the-algorithm/blob/main/README.md The light-ranker is a model used by the search index (Earlybird) to rank posts. It's a crucial component in the initial ranking phase for the 'For You' Timeline. ```Python src/python/twitter/deepbird/projects/timelines/scripts/models/earlybird/README.md ``` -------------------------------- ### Build from Master Branch using Aurora Source: https://github.com/twitter/the-algorithm/blob/main/unified_user_actions/enricher/README.md This command initiates a build for the UUA partitioner staging workflow directly from the master branch using the Aurora workflow tool. ```bash aurora workflow build unified_user_actions/service/deploy/uua-partitioner-staging.workflow ``` -------------------------------- ### Test SimClusters ANN Service Source: https://github.com/twitter/the-algorithm/blob/main/simclusters-ann/README.md Runs unit tests for the SimClusters ANN service using Bazel. This command verifies the correctness of the service's components. ```bash ./bazel test simclusters-ann/server:bin ``` -------------------------------- ### Create ML Model with BigQuery Source: https://github.com/twitter/the-algorithm/blob/main/src/scala/com/twitter/interaction_graph/bqe/training/README.md This BigQuery SQL command creates or replaces a boosted tree classifier model. It configures training parameters, data splitting based on a custom column, and selects features for prediction. ```SQL CREATE OR REPLACE MODEL twttr-recos-ml-prod.realgraph.prod$table_suffix$ OPTIONS( MODEL_TYPE='BOOSTED_TREE_CLASSIFIER', num_parallel_tree=10, max_iterations=50, DATA_SPLIT_METHOD='CUSTOM', DATA_SPLIT_COL='if_eval' ) AS SELECT label, num_days, num_tweets, num_follows FROM twttr-recos-ml-prod.realgraph.prod$table_suffix$; ``` -------------------------------- ### Prepare Candidate Table for Training (SQL) Source: https://github.com/twitter/the-algorithm/blob/main/src/scala/com/twitter/interaction_graph/bqe/training/README.md This BigQuery script prepares a table of candidates for training a machine learning model. It joins interaction data, filters negative edges, calculates statistics, ranks users, and selects the top candidates, adding the end date as a partition column. ```SQL 1. Declares two variables date_start and date_end that are 30 days apart, and date_end is set to the value of $start_time$ parameter (which is a Unix timestamp). 2. Creates a table candidates_for_training that is partitioned by ds (date) and populated with data from several other tables in the database. It joins information from tables of user interactions, tweeting, and interaction graph aggregates, filters out negative edge snapshots, calculates some statistics and aggregates them by source_id and destination_id. Then, it ranks each source_id by the number of days and tweets, selects top 2000, and adds date_end as a new column ds. 3. Finally, it selects the ds column from candidates_for_training where ds equals date_end. Overall, this script prepares a table of 2000 candidate pairs of user interactions with statistics and labels, which can be used to train a machine learning model for recommendation purposes. ``` -------------------------------- ### Topic Social Proof: Identifying Post Topics Source: https://github.com/twitter/the-algorithm/blob/main/README.md This component identifies topics related to individual posts, enriching content understanding and enabling topic-based recommendations. ```Scala package com.twitter.topic_social_proof import com.twitter.util.Future trait TopicIdentifier { def identifyTopics(postId: String): Future[Seq[String]] } object TopicSocialProof extends TopicIdentifier { override def identifyTopics(postId: String): Future[Seq[String]] = { // Placeholder for topic identification logic Future.value(Seq("technology", "machine learning")) } } ``` -------------------------------- ### Open Project in IntelliJ Source: https://github.com/twitter/the-algorithm/blob/main/src/scala/com/twitter/interaction_graph/scio/agg_flock/README.md Opens the InteractionGraphClientEventLogs Dataflow Job project in IntelliJ using Bazel. ```bash ./bazel idea src/scala/com/twitter/interaction_graph/scio/agg_flock:interaction_graph_agg_flock_scio ``` -------------------------------- ### FRS Candidate Sources Source: https://github.com/twitter/the-algorithm/blob/main/follow-recommendations-service/README.md Indicates the directory for candidate generation logic within the Follow Recommendations Service (FRS). This folder contains various user signals and algorithms to identify potential follow candidates, with README files in each subfolder. ```Scala follow-recommendations-service/common/src/main/scala/com/twitter/follow_recommendations/common/candidate_sources/ ``` -------------------------------- ### Initial Run Configuration for User Aggregates v2 Source: https://github.com/twitter/the-algorithm/blob/main/timelines/data_processing/ml_util/aggregation_framework/docs/batch.rst Provides a temporary Capesos configuration for the initial run of the user aggregates v2 job, including a specific start-time argument. ```Scala user_aggregates_v2_initial_run: <<: *__aggregates_v2_common__ cron_schedule: "25 * * * *" arguments: --batches 1 --start-time “2017-03-03 00:00:00” --output_stores user_aggregates --job_name timelines_user_aggregates_v2 ``` -------------------------------- ### Main Python Entry File for Model Evaluation Source: https://github.com/twitter/the-algorithm/blob/main/pushservice/src/main/python/models/light_ranking/README.md The primary Python script for setting up and executing the overall model evaluation pipeline. This file orchestrates the process of evaluating the trained notification light ranker model. ```Python import sys import os # Assuming deep_norm.py and model_pools_mlp.py are in the same directory or accessible via PYTHONPATH # from deep_norm import build_tensorflow_graph, train_model, evaluate_model # from model_pools_mlp import build_model def main(): print("Starting model evaluation...") # Placeholder for evaluation pipeline setup # 1. Load data # 2. Build model # 3. Load trained weights (if applicable) # 4. Evaluate model # 5. Print results print("Model evaluation finished.") if __name__ == "__main__": main() ``` -------------------------------- ### Earlybird Root Server Implementation Source: https://github.com/twitter/the-algorithm/blob/main/src/java/com/twitter/search/earlybird/README.md This snippet refers to the implementation of Earlybird servers, which handle the fan-out of queries. The code is located in the specified Java directory. ```Java src/java/com/twitter/search/earlybird_root/ ``` -------------------------------- ### Pushservice for Recommended Notifications Source: https://github.com/twitter/the-algorithm/blob/main/README.md The pushservice is the main recommendation service at X, responsible for surfacing recommendations to users via notifications. It handles candidate generation and ranking. ```N/A pushservice/README.md ``` -------------------------------- ### Build Jar for InteractionGraphClientEventLogs Scio Project Source: https://github.com/twitter/the-algorithm/blob/main/src/scala/com/twitter/interaction_graph/scio/agg_client_event_logs/README.md Bundles the InteractionGraphClientEventLogs Scio project into a JAR file using Bazel. ```bash ./bazel bundle src/scala/com/twitter/interaction_graph/scio/agg_client_event_logs:interaction_graph_client_event_logs_scio ``` -------------------------------- ### Lists Timeline Pipeline Configuration Source: https://github.com/twitter/the-algorithm/blob/main/home-mixer/README.md Presents the pipeline configuration for the 'Lists' timeline. It includes pipelines for fetching tweets from the timeline service, handling conversation modules, and integrating ads and user recommendations. ```Scala - ListTweetsProductPipelineConfig - ListTweetsMixerPipelineConfig - ListTweetsTimelineServiceCandidatePipelineConfig (fetch tweets from timeline service) - ConversationServiceCandidatePipelineConfig (fetch ancestors for conversation modules) - ListTweetsAdsCandidatePipelineConfig (fetch ads) ```