# Arctic Shift

Arctic Shift is a comprehensive Reddit data archive that makes historical Reddit content accessible to researchers, moderators, and the general public. The project provides access to Reddit posts and comments dating back to 2005, with data retrieved through the official Reddit API and stored in compressed JSON format. Arctic Shift offers multiple ways to interact with this data: downloadable monthly data dumps via Academic Torrents, a RESTful API for querying specific content, and a web-based search interface.

The core functionality includes searching posts and comments by various criteria (author, subreddit, date range, keywords), retrieving comment trees for specific posts, aggregating data for analytics, and accessing subreddit metadata including rules and wiki pages. The Python helper scripts enable efficient processing of compressed data dumps locally, supporting `.zst`, `.zst_blocks`, `.jsonl`, and `.json` file formats. Data is updated with a 36-hour delay to capture accurate scores and comment counts.

## API Reference

Base URL: `https://arctic-shift.photon-reddit.com`

### Retrieve Posts/Comments by ID

Fetch multiple posts or comments using their Reddit IDs. Supports up to 500 IDs per request.

```bash
# Retrieve two posts by ID
curl "https://arctic-shift.photon-reddit.com/api/posts/ids?ids=ei30r4,eitwb3"

# Retrieve comments by ID with HTML rendering
curl "https://arctic-shift.photon-reddit.com/api/comments/ids?ids=dppum98,abc123&md2html=true"

# Select specific fields only
curl "https://arctic-shift.photon-reddit.com/api/posts/ids?ids=ei30r4&fields=author,title,score,created_utc"

# Response format:
# {
#   "data": [
#     {
#       "id": "ei30r4",
#       "author": "username",
#       "title": "Post title here",
#       "subreddit": "worldnews",
#       "score": 1234,
#       "created_utc": 1577836800,
#       ...
#     }
#   ]
# }
```

### Search Posts

Search for posts with filtering by subreddit, author, date range, and keywords in title or body.

```bash
# Search r/worldnews for posts with "wuhan" in title, after Dec 30 2019, sorted ascending
curl "https://arctic-shift.photon-reddit.com/api/posts/search?subreddit=worldnews&title=wuhan&after=2019-12-30&sort=asc&limit=10"

# Search by author with date range
curl "https://arctic-shift.photon-reddit.com/api/posts/search?author=spez&after=2020-01-01&before=2021-01-01&limit=25"

# Search with URL prefix matching
curl "https://arctic-shift.photon-reddit.com/api/posts/search?url=https://www.youtube.com/watch&subreddit=videos&limit=50"

# Full-text search in title and selftext combined
curl "https://arctic-shift.photon-reddit.com/api/posts/search?query=machine%20learning&subreddit=datascience&limit=100"

# Filter NSFW content
curl "https://arctic-shift.photon-reddit.com/api/posts/search?subreddit=funny&over_18=false&limit=25"

# Response format:
# {
#   "data": [
#     {
#       "id": "abc123",
#       "author": "username",
#       "title": "Post title",
#       "selftext": "Post body content",
#       "subreddit": "worldnews",
#       "score": 5432,
#       "num_comments": 234,
#       "created_utc": 1577836800,
#       "url": "https://example.com/article",
#       ...
#     }
#   ]
# }
```

### Search Comments

Search for comments with filtering options including post ID and parent comment ID.

```bash
# Search for comments by a specific user under a specific post
curl "https://arctic-shift.photon-reddit.com/api/comments/search?author=PresidentObama&link_id=z1c9z&limit=100"

# Search top-level comments only (no parent)
curl "https://arctic-shift.photon-reddit.com/api/comments/search?subreddit=askreddit&parent_id=&limit=50"

# Search comments containing specific text
curl "https://arctic-shift.photon-reddit.com/api/comments/search?author=spez&body=community&limit=25"

# Get comments with HTML-rendered markdown
curl "https://arctic-shift.photon-reddit.com/api/comments/search?subreddit=programming&after=2024-01-01&md2html=true&limit=10"

# Response format:
# {
#   "data": [
#     {
#       "id": "xyz789",
#       "author": "username",
#       "body": "Comment text here",
#       "link_id": "t3_abc123",
#       "parent_id": "t1_def456",
#       "subreddit": "askreddit",
#       "score": 42,
#       "created_utc": 1577836800,
#       ...
#     }
#   ]
# }
```

### Get Comment Tree

Retrieve comments in a hierarchical tree structure, similar to Reddit's display format.

```bash
# Get comment tree for a post with a specific parent comment as root
curl "https://arctic-shift.photon-reddit.com/api/comments/tree?link_id=t3_7cff0b&parent_id=t1_dppum98&md2html=true"

# Get ALL comments under a post (use high limit)
curl "https://arctic-shift.photon-reddit.com/api/comments/tree?link_id=t3_x8i09x&limit=9999"

# Control comment collapsing depth and breadth
curl "https://arctic-shift.photon-reddit.com/api/comments/tree?link_id=t3_abc123&start_depth=6&start_breadth=5&limit=500"

# Response format (tree structure):
# {
#   "data": [
#     {
#       "kind": "t1",
#       "data": {
#         "id": "comment_id",
#         "body": "Comment text",
#         "author": "username",
#         "replies": {
#           "data": {
#             "children": [...]  // nested comments
#           }
#         }
#       }
#     },
#     {
#       "kind": "more",
#       "data": {
#         "children": ["id1", "id2", ...]  // collapsed comment IDs
#       }
#     }
#   ]
# }
```

### Aggregate Posts/Comments

Generate aggregate statistics by date, author, or subreddit.

```bash
# Comment frequency of u/spez by year since 2006
curl "https://arctic-shift.photon-reddit.com/api/comments/search/aggregate?aggregate=created_utc&frequency=year&author=spez&after=2006-01-01"

# Most active posters in r/announcements
curl "https://arctic-shift.photon-reddit.com/api/posts/search/aggregate?aggregate=author&subreddit=announcements"

# Top subreddits by user activity
curl "https://arctic-shift.photon-reddit.com/api/posts/search/aggregate?aggregate=subreddit&author=gallowboob&limit=20"

# Monthly post distribution with minimum count filter
curl "https://arctic-shift.photon-reddit.com/api/posts/search/aggregate?aggregate=created_utc&frequency=month&subreddit=technology&after=2020-01-01&before=2024-01-01"

# Response format:
# {
#   "data": [
#     {"key": "2023", "doc_count": 1542},
#     {"key": "2022", "doc_count": 2103},
#     ...
#   ]
# }
```

### Search Subreddits

Search for subreddits by name prefix, subscriber count, and creation date.

```bash
# Search for subreddits starting with "ask" sorted by subscribers
curl "https://arctic-shift.photon-reddit.com/api/subreddits/search?subreddit_prefix=ask"

# Find oldest subreddits with more than 1000 subscribers
curl "https://arctic-shift.photon-reddit.com/api/subreddits/search?min_subscribers=1000&sort_type=created_utc&sort=asc&limit=50"

# Filter NSFW subreddits
curl "https://arctic-shift.photon-reddit.com/api/subreddits/search?over18=true&min_subscribers=10000&limit=25"

# Response format:
# {
#   "data": [
#     {
#       "display_name": "AskReddit",
#       "subscribers": 45000000,
#       "created_utc": 1201233135,
#       "description": "Subreddit description...",
#       "over18": false,
#       ...
#     }
#   ]
# }
```

### Get Subreddit Rules

Retrieve the rules defined for one or more subreddits.

```bash
# Get rules for multiple subreddits
curl "https://arctic-shift.photon-reddit.com/api/subreddits/rules?subreddits=askreddit,politics,science"

# Response format:
# {
#   "data": {
#     "askreddit": [
#       {
#         "short_name": "Rule 1",
#         "description": "Full rule description...",
#         "kind": "all"
#       },
#       ...
#     ]
#   }
# }
```

### Get Subreddit Wikis

Retrieve wiki pages from subreddits.

```bash
# Get all wiki pages from a subreddit
curl "https://arctic-shift.photon-reddit.com/api/subreddits/wikis?subreddit=askreddit&limit=50"

# Get specific wiki pages by path
curl "https://arctic-shift.photon-reddit.com/api/subreddits/wikis?paths=/r/reddit.com/wiki/faq,/r/travel/wiki/faq"

# List all wiki page paths in a subreddit
curl "https://arctic-shift.photon-reddit.com/api/subreddits/wikis/list?subreddit=askreddit"

# Response format:
# {
#   "data": [
#     {
#       "path": "/r/askreddit/wiki/index",
#       "content": "Wiki page content in markdown...",
#       "revision_date": 1577836800
#     }
#   ]
# }
```

### Search Users

Search for users by name prefix, activity metrics, and karma.

```bash
# Search for users with the most karma
curl "https://arctic-shift.photon-reddit.com/api/users/search?sort_type=total_karma&limit=25"

# Search for users starting with "mod" who have at least 1000 comments
curl "https://arctic-shift.photon-reddit.com/api/users/search?author_prefix=mod&min_num_comments=1000&sort_type=author&sort=asc"

# Response format:
# {
#   "data": [
#     {
#       "author": "username",
#       "total_karma": 5000000,
#       "num_posts": 1234,
#       "num_comments": 56789,
#       "first_post_utc": 1234567890,
#       "last_comment_utc": 1677654321
#     }
#   ]
# }
```

### User Interactions

Analyze interactions between users or user activity across subreddits.

```bash
# Get user-to-user interactions for u/spez before 2017 with min 10 interactions
curl "https://arctic-shift.photon-reddit.com/api/users/interactions/users?author=spez&before=2017-01-01&min_count=10"

# List individual interactions
curl "https://arctic-shift.photon-reddit.com/api/users/interactions/users/list?author=spez&subreddit=announcements&limit=50"

# Get user activity across subreddits with custom weighting
curl "https://arctic-shift.photon-reddit.com/api/users/interactions/subreddits?author=gallowboob&weight_posts=2.0&weight_comments=1.0&limit=20"

# Response format for interactions:
# {
#   "data": [
#     {"author": "other_user", "count": 45},
#     {"author": "another_user", "count": 32},
#     ...
#   ]
# }
```

### Aggregate User Flairs

Get all author flairs used by a user, grouped by subreddit.

```bash
curl "https://arctic-shift.photon-reddit.com/api/users/aggregate_flairs?author=spez"

# Response format:
# {
#   "data": {
#     "announcements": ["Admin", "CEO"],
#     "reddit": ["A"],
#     ...
#   }
# }
```

### Resolve Short Links

Convert Reddit short links (r/subreddit/s/xxx format) to full URLs.

```bash
curl "https://arctic-shift.photon-reddit.com/api/short_links?paths=/r/running/s/3TzXiyxaMD,/u/CEO_Gola/s/WO7Ro11h1a"

# Response format:
# {
#   "data": {
#     "/r/running/s/3TzXiyxaMD": "https://www.reddit.com/r/running/comments/...",
#     "/u/CEO_Gola/s/WO7Ro11h1a": "https://www.reddit.com/user/..."
#   }
# }
```

### Time Series Data

Retrieve aggregated metrics over time for global Reddit activity or specific subreddits.

```bash
# Get global post count per year
curl "https://arctic-shift.photon-reddit.com/api/time_series?key=global/posts/count&precision=year"

# Get r/askreddit subscriber growth per month
curl "https://arctic-shift.photon-reddit.com/api/time_series?key=r/askreddit/subscribers&precision=month&after=2020-01-01"

# Get comment activity in a subreddit by week
curl "https://arctic-shift.photon-reddit.com/api/time_series?key=r/programming/comments/count&precision=week&after=2023-01-01&before=2024-01-01"

# Available keys:
# - global/posts/count, global/comments/count
# - global/posts/sum_score, global/comments/sum_score
# - r/<subreddit>/posts/count, r/<subreddit>/comments/count
# - r/<subreddit>/posts/sum_score, r/<subreddit>/comments/sum_score
# - r/<subreddit>/subscribers

# Response format:
# {
#   "data": [
#     {"key": "2023-01", "value": 15234567},
#     {"key": "2023-02", "value": 14567890},
#     ...
#   ]
# }
```

## Python Scripts for Processing Data Dumps

### Process Compressed Reddit Data Files

The Python scripts enable processing of downloaded data dumps in various compressed formats.

```python
# scripts/processFiles.py - Main processing script
import sys
import os
from fileStreams import getFileJsonStream
from utils import FileProgressLog

# Set the path to your downloaded data file or folder
fileOrFolderPath = r"/path/to/reddit_data/RC_2024-01.zst"
recursive = False  # Set True to process subfolders

def processFile(path: str):
    """Process a single Reddit data file"""
    print(f"Processing file {path}")
    with open(path, "rb") as f:
        jsonStream = getFileJsonStream(path, f)
        if jsonStream is None:
            print(f"Skipping unknown file {path}")
            return

        progressLog = FileProgressLog(path, f)

        # Track statistics
        comment_count = 0
        unique_authors = set()

        for row in jsonStream:
            progressLog.onRow()

            # Access common fields
            author = row["author"]
            subreddit = row["subreddit"]
            post_id = row["id"]
            created = row["created_utc"]
            score = row["score"]

            # For comments files (RC_*.zst)
            if "body" in row:
                body = row["body"]
                parent_id = row["parent_id"]  # t3_xxx (post) or t1_xxx (comment)
                link_id = row["link_id"]      # t3_xxx (post ID)

            # For posts/submissions files (RS_*.zst)
            if "title" in row:
                title = row["title"]
                selftext = row.get("selftext", "")
                url = row.get("url", "")
                num_comments = row.get("num_comments", 0)

            # Example: Count comments by subreddit
            comment_count += 1
            unique_authors.add(author)

        progressLog.logProgress("\n")
        print(f"Total records: {comment_count:,}")
        print(f"Unique authors: {len(unique_authors):,}")

# Run processing
if os.path.isdir(fileOrFolderPath):
    for file in os.listdir(fileOrFolderPath):
        processFile(os.path.join(fileOrFolderPath, file))
else:
    processFile(fileOrFolderPath)
```

### File Stream Utilities

Utilities for reading different compressed file formats.

```python
# scripts/fileStreams.py - Streaming JSON from compressed files
from typing import BinaryIO, Iterator
import zstandard

try:
    import orjson as json  # Faster JSON parsing (recommended)
except ImportError:
    import json

def getFileJsonStream(path: str, f: BinaryIO) -> Iterator[dict] | None:
    """
    Get appropriate JSON stream based on file extension.
    Supports: .zst, .zst_blocks, .jsonl, .ndjson, .json
    """
    if path.endswith(".jsonl") or path.endswith(".ndjson"):
        return getJsonLinesFileJsonStream(f)
    elif path.endswith(".zst"):
        return getZstFileJsonStream(f)
    elif path.endswith(".zst_blocks"):
        return getZstBlocksFileJsonStream(f)
    elif path.endswith(".json"):
        return getJsonFileStream(f)
    return None

def getZstFileJsonStream(f: BinaryIO, chunk_size=1024*1024*10) -> Iterator[dict]:
    """Stream JSON objects from a zstandard compressed file"""
    decompressor = zstandard.ZstdDecompressor(max_window_size=2**31)
    currentString = ""
    zstReader = decompressor.stream_reader(f)

    while True:
        chunk = zstReader.read(chunk_size)
        if not chunk:
            break
        currentString += chunk.decode("utf-8", "replace")
        lines = currentString.split("\n")
        currentString = lines[-1]
        for line in lines[:-1]:
            if line:
                yield json.loads(line)

    if currentString:
        yield json.loads(currentString)

# Example: Extract all comments from a specific subreddit
def extract_subreddit_comments(zst_path: str, target_subreddit: str, output_path: str):
    """Extract all comments from a specific subreddit to a new file"""
    import json as std_json

    with open(zst_path, "rb") as f, open(output_path, "w") as out:
        for record in getZstFileJsonStream(f):
            if record.get("subreddit", "").lower() == target_subreddit.lower():
                out.write(std_json.dumps(record) + "\n")

# Usage:
# extract_subreddit_comments("RC_2024-01.zst", "programming", "programming_comments.jsonl")
```

### Progress Tracking Utility

Track processing progress for large files with time estimates.

```python
# scripts/utils.py - Progress logging utility
import os
import time
from typing import BinaryIO

class FileProgressLog:
    """Track and display progress when processing large files"""

    def __init__(self, path: str, file: BinaryIO):
        self.file = file
        self.fileSize = os.path.getsize(path)
        self.i = 0
        self.startTime = time.time()
        self.printEvery = 10_000

    def onRow(self):
        """Call this for each processed row"""
        self.i += 1
        if self.i % self.printEvery == 0:
            self.logProgress()

    def logProgress(self, end=""):
        """Print current progress with time estimates"""
        progress = self.file.tell() / self.fileSize if not self.file.closed else 1
        elapsed = time.time() - self.startTime
        remaining = (elapsed / progress - elapsed) if progress > 0 else 0

        print(f"\r{self.i:,} rows - {progress:.2%} - "
              f"elapsed: {elapsed:.0f}s - remaining: {remaining:.0f}s", end=end)

# Example output during processing:
# 1,230,000 rows - 45.32% - elapsed: 120s - remaining: 145s
```

## Summary

Arctic Shift serves as the primary successor to Pushshift for Reddit data archival, catering to researchers studying social media trends, moderators needing historical data for community management, and developers building Reddit analytics tools. The API provides comprehensive search, aggregation, and tree-building capabilities that mirror Reddit's own data structures, while the downloadable dumps enable large-scale offline analysis. Common use cases include tracking discourse evolution over time, analyzing user behavior patterns, building recommendation systems, and conducting academic research on online communities.

For integration, the API follows RESTful conventions with consistent parameter naming across endpoints, making it straightforward to build client libraries. Rate limiting is permissive for typical usage, but heavy analysis should use the monthly data dumps instead. The Python processing scripts provide a foundation for building custom analysis pipelines, supporting streaming decompression to handle files that exceed available RAM. Data is available from June 2005 through the present, with new monthly archives typically released within a few days of each month's end via Academic Torrents.