# bojstat - Bank of Japan Time Series Statistics Python Client bojstat is a Python client library for accessing the [Bank of Japan Time Series Statistics Search Site](https://www.stat-search.boj.or.jp/) API. It provides a comprehensive interface to retrieve financial and economic time series data published by the Bank of Japan, supporting all three API endpoints: Code API, Layer API, and Metadata API. The library offers both synchronous and asynchronous clients, automatic pagination for large datasets, local file caching with configurable TTL, exponential backoff retry logic, and seamless conversion to pandas/polars DataFrames. Built on top of httpx for HTTP communication, bojstat handles the complexity of the BOJ API including rate limiting, error classification, and data consistency checks, making it easy to integrate Japanese financial data into Python applications. ## Installation ```bash # Basic installation pip install bojstat-py # With pandas support pip install 'bojstat-py[pandas]' # With polars support pip install 'bojstat-py[polars]' # With CLI tools pip install 'bojstat-py[cli]' # Full installation (pandas + polars + CLI) pip install 'bojstat-py[dataframe,cli]' ``` --- ## BojClient - Synchronous Client The main synchronous client for accessing Bank of Japan APIs. Supports context manager pattern for proper resource management. ```python from bojstat import BojClient, DB, Lang, Format, CacheMode # Basic usage with context manager with BojClient() as client: # Fetch Tankan survey data frame = client.data.get_by_code( db=DB.CO, code="TK99F1000601GCQ01000", start="202401", end="202412", ) print(f"Status: {frame.meta.status}") # 200 print(f"Records: {len(frame.records)}") for record in frame.records[:5]: print(f"{record.survey_date}: {record.value} {record.unit}") # Advanced configuration client = BojClient( timeout=60.0, # HTTP timeout in seconds lang=Lang.EN, # Default language (JP or EN) format=Format.JSON, # Output format (JSON or CSV) rate_limit_per_sec=0.5, # Rate limiting (requests per second) cache_dir="./cache", # Enable file caching cache_mode=CacheMode.IF_STALE, # Cache mode: if_stale, force_refresh, off cache_ttl=12 * 60 * 60, # Cache TTL in seconds (12 hours) retry_max_attempts=5, # Max retry attempts retry_base_delay=0.5, # Base delay for exponential backoff http2=True, # Enable HTTP/2 proxy="http://proxy:8080", # Proxy URL ) try: frame = client.data.get_by_code(db="FM08", code="FXERD01") finally: client.close() ``` --- ## AsyncBojClient - Asynchronous Client Async client with identical API to the synchronous version. Use `await` for all data fetching methods. ```python import asyncio from bojstat import AsyncBojClient, DB async def fetch_exchange_rates(): async with AsyncBojClient() as client: # Fetch foreign exchange market data frame = await client.data.get_by_code( db=DB.FM08, code=["FXERD01", "FXERD02"], # Multiple codes start="202401", end="202412", ) print(f"Total records: {len(frame.records)}") for record in frame.records[:3]: print(f"{record.series_code} - {record.survey_date}: {record.value}") # Get metadata meta = await client.metadata.get(db=DB.FM08) print(f"Available series: {len(meta.series_codes)}") return frame # Run async function frame = asyncio.run(fetch_exchange_rates()) ``` --- ## DataService.get_by_code - Code API Retrieve time series data by specifying series codes directly. Supports single or multiple codes. ```python from bojstat import BojClient, DB with BojClient() as client: # Single series code frame = client.data.get_by_code( db=DB.FM08, # Foreign exchange market code="FXERD01", # USD/JPY spot rate start="202401", # Start date (YYYYMM format) end="202412", # End date ) # Multiple series codes frame = client.data.get_by_code( db=DB.FM08, code=["FXERD01", "FXERD02", "FXERD03"], # List of codes start="202401", ) # Access response metadata print(f"API Status: {frame.meta.status}") print(f"Message: {frame.meta.message}") print(f"Request URL: {frame.meta.request_url}") # Iterate over records for record in frame.records: print(f""" Series: {record.series_code} Name: {record.series_name} Date: {record.survey_date} Value: {record.value} Unit: {record.unit} Frequency: {record.frequency} Last Update: {record.last_update} """) ``` --- ## DataService.get_by_layer - Layer API Retrieve time series data using hierarchical layer specification. Useful for exploring data by category structure. ```python from bojstat import BojClient, DB with BojClient() as client: # Specific layer path frame = client.data.get_by_layer( db=DB.BP01, # Balance of payments frequency="M", # Monthly (M, Q, CY, FY, etc.) layer=[1, 1, 1], # Hierarchy: layer1=1, layer2=1, layer3=1 start="202504", end="202509", ) for record in frame.records: print(f"{record.series_code}: {record.survey_date} = {record.value}") # Wildcard to fetch all layers frame = client.data.get_by_layer( db=DB.FF, # Flow of Funds frequency="Q", # Quarterly layer="*", # All layers auto_paginate=True, # Auto-fetch all pages (default) ) print(f"Total records from all layers: {len(frame.records)}") # Manual pagination control frame = client.data.get_by_layer( db=DB.FF, frequency="Q", layer="*", auto_paginate=False, # Get only first page ) print(f"Next position: {frame.meta.next_position}") # None if complete ``` --- ## MetadataService.get - Metadata API Retrieve metadata for all series in a database, including series codes, names, frequencies, and coverage periods. ```python from bojstat import BojClient, DB, Frequency with BojClient() as client: # Get all metadata for a database meta = client.metadata.get(db=DB.FM08) # List all series codes print(f"All series codes: {meta.series_codes[:10]}...") # Get first N records for record in meta.head(5).records: print(f""" Code: {record.series_code} Name: {record.series_name} Frequency: {record.frequency} Start: {record.start_date} End: {record.end_date} """) # Search by name (case-insensitive) usd_series = meta.find(name_contains="ドル") print(f"USD-related series: {usd_series.series_codes}") # Filter by frequency daily = meta.find(frequency="DAILY") print(f"Daily series count: {len(daily.records)}") # Combined search result = meta.find(name_contains="ドル", frequency=Frequency.D) for rec in result.records[:5]: print(f"{rec.series_code}: {rec.series_name}") ``` --- ## DataFrame Conversion - pandas and polars Convert API responses to pandas or polars DataFrames for analysis. ```python from bojstat import BojClient, DB with BojClient() as client: frame = client.data.get_by_code( db=DB.FM08, code=["FXERD01", "FXERD02"], start="202401", ) # Convert to pandas DataFrame df_pandas = frame.to_pandas() print(df_pandas[["series_code", "survey_date", "value"]].head()) # series_code survey_date value # 0 FXERD01 20240104 144.62 # 1 FXERD01 20240105 145.09 # ... # Convert to polars DataFrame df_polars = frame.to_polars() print(df_polars.select(["series_code", "survey_date", "value"]).head()) # Long format (default) - list of dicts rows = frame.to_long(numeric_mode="float64") # or "decimal", "string" # Wide format (pivot by series_code) pivot = frame.to_wide() # [{"survey_date": "20240104", "FXERD01": 144.62, "FXERD02": 156.78}, ...] # Metadata to DataFrame meta = client.metadata.get(db=DB.FM08) df_meta = meta.to_pandas() print(df_meta.columns.tolist()) ``` --- ## DB Catalog - Database Discovery Explore available databases without calling the API. All 50+ database codes are built into the library. ```python from bojstat import list_dbs, get_db_info, DB # List all available databases for info in list_dbs(): print(f"{info.code}: {info.name_ja}") # IR01: 基準割引率および基準貸付利率... # IR02: 預金種類別店頭表示金利の平均年利率等 # ... # Filter by category for info in list_dbs(category="マーケット"): print(f"{info.code}: {info.name_ja} ({info.category_ja})") # Get specific database info info = get_db_info("FM08") print(f"Name: {info.name_ja}") # 外国為替市況 print(f"Category: {info.category_ja}") # マーケット関連 # DB enum works as string print(DB.FM08 == "FM08") # True print(DB.CO) # "CO" # Available DB codes include: # - IR01-IR04: Interest rates # - FM01-FM09: Market data (call rates, forex, etc.) # - MD01-MD14: Money and deposits # - LA01-LA05: Loans # - BS01-BS02: Financial institution balance sheets # - FF: Flow of Funds # - CO: Tankan (Short-term Economic Survey) # - PR01-PR04: Price indices # - BP01, BIS, DER: Balance of payments ``` --- ## Error Handling All exceptions inherit from `BojError`. Handle specific error types for robust applications. ```python import bojstat from bojstat import BojClient with BojClient() as client: try: frame = client.data.get_by_code( db="INVALID_DB", code="XXX", ) except bojstat.BojBadRequestError as e: # STATUS=400 - Invalid parameters print(f"Bad request: {e.status} - {e.message_id}") print(f"Message: {e.message}") print(f"URL: {e.context.request_url}") except bojstat.BojServerError as e: # STATUS=500 - Server error print(f"Server error: {e.status}") except bojstat.BojUnavailableError as e: # STATUS=503 - Database unavailable print(f"Service unavailable: {e.status}") except bojstat.BojGatewayError as e: # Non-JSON response from upstream gateway print(f"Gateway error: {e.message}") except bojstat.BojTransportError as e: # Network/timeout errors print(f"Network error: {e}") except bojstat.BojValidationError as e: # Client-side validation errors print(f"Validation error ({e.validation_code}): {e}") except bojstat.BojConsistencyError as e: # Data consistency errors during pagination print(f"Consistency error: {e.signal}") print(f"Details: {e.details}") except bojstat.BojError as e: # Catch-all for any library exception print(f"BOJ error ({e.origin}): {e}") ``` --- ## Error Classification Classify API error codes to understand their semantic meaning. ```python from bojstat import BojClient with BojClient() as client: # Classify a known error code result = client.errors.classify(status=400, message_id="M181005E") print(f"Category: {result.category}") # "invalid_db" print(f"Confidence: {result.confidence}") # 1.0 print(f"Key: {result.observation_key}") # "400:M181005E" # Common error categories: # - "ok": M181000I - Success # - "no_data": M181030I - Success but no matching data # - "invalid_db": M181005E - Invalid database code # - "invalid_parameter": M181001E - Invalid parameter # - "code_count_overflow": M181007E - Too many codes (>250) # - "code_not_found": M181013E - Series code not found # - "internal_error": M181090S - Server internal error # - "db_unavailable": M181091S - Database unavailable ``` --- ## Caching Configuration Enable local file caching to reduce API calls and improve performance. ```python from bojstat import BojClient, CacheMode # Enable caching with custom TTL client = BojClient( cache_dir="./boj_cache", # Cache directory cache_ttl=60 * 60 * 24, # 24 hour TTL cache_mode=CacheMode.IF_STALE, # Use cache if not stale ) # Force refresh - always fetch from API client = BojClient( cache_dir="./boj_cache", cache_mode=CacheMode.FORCE_REFRESH, ) # Disable caching entirely client = BojClient( cache_mode=CacheMode.OFF, ) # With caching enabled, repeated requests return cached data: with BojClient(cache_dir="./cache") as client: frame1 = client.data.get_by_code(db="FM08", code="FXERD01") # API call frame2 = client.data.get_by_code(db="FM08", code="FXERD01") # From cache ``` --- ## Retry and Rate Limiting Configure automatic retry behavior and rate limiting to handle transient errors. ```python from bojstat import BojClient, ConsistencyMode # Custom retry configuration client = BojClient( retry_max_attempts=3, # Max retry attempts (default: 5) retry_base_delay=1.0, # Base delay in seconds (default: 0.5) retry_cap_delay=16.0, # Max delay cap (default: 8.0) rate_limit_per_sec=0.5, # 1 request per 2 seconds ) # Disable retry client = BojClient(retry_max_attempts=1) # Consistency mode for large paginated requests client = BojClient( consistency_mode=ConsistencyMode.STRICT, # Raise error on conflicts (default) ) client = BojClient( consistency_mode=ConsistencyMode.BEST_EFFORT, # Warn but continue ) ``` --- ## CLI Usage Command-line interface for quick data retrieval. Requires `bojstat-py[cli]` installation. ```bash # Fetch metadata and save as JSON bojstat metadata --db FM08 --out metadata.json # Fetch time series by code (CSV output) bojstat code --db CO --code TK99F1000601GCQ01000 --start 202401 --out data.csv # Fetch time series by layer (Parquet output) bojstat layer --db BP01 --frequency M --layer "1,1,1" --start 202504 --out data.parquet # With English language output bojstat metadata --db FM08 --lang en --out metadata_en.json # Available commands: # - metadata: Fetch database metadata # - code: Fetch data using Code API # - layer: Fetch data using Layer API # Supported output formats: # - .json: JSON with records and metadata # - .csv: CSV format (requires pandas) # - .parquet: Parquet format (requires pyarrow) ``` --- ## HTTP Client Customization Inject custom httpx client or configure proxy and HTTP/2 settings. ```python import httpx from bojstat import BojClient, AsyncBojClient # Enable HTTP/2 client = BojClient(http2=True) # Configure proxy client = BojClient(proxy="http://proxy.example.com:8080") # Inject external httpx.Client http_client = httpx.Client( base_url="https://www.stat-search.boj.or.jp/api/v1", timeout=60.0, headers={"X-Custom-Header": "value"}, ) client = BojClient(http_client=http_client) # For async, inject httpx.AsyncClient async_http_client = httpx.AsyncClient( base_url="https://www.stat-search.boj.or.jp/api/v1", timeout=60.0, ) async_client = AsyncBojClient(http_client=async_http_client) # Connection limits client = BojClient( limits=httpx.Limits(max_connections=10, max_keepalive_connections=5) ) ``` --- ## Summary bojstat is designed for financial data analysts, economists, and developers who need programmatic access to Bank of Japan statistics. The library excels at handling the BOJ API's pagination limits (250 series / 60,000 records per request) automatically, making it possible to fetch large datasets without manual intervention. Common use cases include: building economic dashboards, time series analysis of Japanese financial markets, automated reporting systems, and research applications requiring historical BOJ data. The library integrates seamlessly with the Python data science ecosystem through pandas and polars support. For production deployments, use caching to reduce API load and improve response times, configure appropriate rate limits to avoid connection issues, and implement proper error handling for the various exception types. The async client enables efficient concurrent data fetching, while the CLI provides quick access for ad-hoc queries and scripting.