### Installation - Package Setup Methods Source: https://github.com/yichen0831/opencc-python/blob/master/opencc/README.md Shows two installation methods for the opencc-python library: manual directory copy and package manager installation via pip. The pip installation provides the reimplemented version from PyPI repository. ```shell python setup.py install ``` ```shell pip install opencc-python-reimplemented ``` -------------------------------- ### Command-Line Interface for OpenCC Source: https://context7.com/yichen0831/opencc-python/llms.txt Provides examples of using the OpenCC command-line interface for converting text files. It covers basic file conversion, Taiwan variant conversion with phrases, stdin/stdout piping, custom encoding specification, and displaying help information. ```bash # Basic file conversion python -m opencc -c s2t -i input_simplified.txt -o output_traditional.txt # Taiwan variant conversion with phrase support python -m opencc -c s2twp -i mainland_chinese.txt -o taiwan_chinese.txt # Convert from stdin to stdout (pipe support) echo "开放中文转换" | python -m opencc -c s2t # Output: 開放中文轉換 # Specify custom encodings python -m opencc -c t2s \ --in-enc UTF-8 \ --out-enc UTF-8 \ -i traditional.txt \ -o simplified.txt # Hong Kong to Simplified conversion python -m opencc -c hk2s -i hongkong.txt -o simplified.txt # Display help information python -m opencc -h # Command-line parameters: ``` -------------------------------- ### Command Line: Convert Chinese Characters with OpenCC Source: https://github.com/yichen0831/opencc-python/blob/master/README.md This command-line interface (CLI) usage example shows how to perform Chinese character conversions using the OpenCC Python module. It allows specifying input and output files, the conversion type, and character encodings. The example demonstrates converting a file from simplified to traditional Chinese using UTF-8 encoding. ```bash python -m opencc -c s2t -i my_simplified_input_file.txt -o my_traditional_output_file.txt ``` -------------------------------- ### Convert Chinese Text with OpenCC (Python) Source: https://context7.com/yichen0831/opencc-python/llms.txt Shows how to use the `convert()` method of the OpenCC class to perform bidirectional Chinese text conversions. Examples include Simplified to Traditional, Taiwan variant with phrase support, and Traditional to Simplified. ```python from opencc import OpenCC # Example 1: Simplified to Traditional conversion cc = OpenCC('s2t') simplified_text = '鼠标是一种很常见及常用的电脑输入设备' traditional_text = cc.convert(simplified_text) print(traditional_text) # Output: 鼠標是一種很常見及常用的電腦輸入設備 # Example 2: Taiwan variant with phrase conversion cc_tw = OpenCC('s2twp') text = '內存是一种很常见及常用的电脑输入设备' result = cc_tw.convert(text) print(result) # Output: 記憶體是一種很常見及常用的電腦輸入裝置 # Note: 內存 (mainland term) → 記憶體 (Taiwan term for "memory") # Example 3: Traditional to Simplified cc_reverse = OpenCC('t2s') trad_text = '香菸(英語:Cigarette),爲菸草製品的一種。' simp_text = cc_reverse.convert(trad_text) print(simp_text) # Output: 香烟(英语:Cigarette),为烟草制品的一种。 # Example 4: Error handling for unset conversion try: cc_empty = OpenCC() cc_empty.convert('some text') # Raises ValueError except ValueError as e: print("Error: Conversion mode not set") ``` -------------------------------- ### Web API Integration with Flask and OpenCC Python Source: https://context7.com/yichen0831/opencc-python/llms.txt Provides an example of integrating OpenCC into a Flask web service for real-time Chinese text conversion. It sets up an API endpoint that accepts text and a conversion mode, returning the converted text. The service caches common converters for efficiency. ```python from opencc import OpenCC from flask import Flask, request, jsonify app = Flask(__name__) # Initialize converters for common modes (cached) converters = { 's2t': OpenCC('s2t'), 's2tw': OpenCC('s2tw'), 's2twp': OpenCC('s2twp'), 't2s': OpenCC('t2s'), 'tw2s': OpenCC('tw2s'), 'hk2s': OpenCC('hk2s'), 's2hk': OpenCC('s2hk'), } @app.route('/convert', methods=['POST']) def convert_text(): """ API endpoint for Chinese text conversion Request JSON: { "text": "开放中文转换", "conversion": "s2t" } Response JSON: { "original": "开放中文转换", "converted": "開放中文轉換", "conversion": "s2t" } """ try: data = request.get_json() text = data.get('text', '') conversion = data.get('conversion', 's2t') if conversion not in converters: return jsonify({ 'error': f'Invalid conversion mode: {conversion}', 'valid_modes': list(converters.keys()) }), 400 converted = converters[conversion].convert(text) return jsonify({ 'original': text, 'converted': converted, 'conversion': conversion }) except Exception as e: return jsonify({'error': str(e)}), 500 # Example usage with curl: # curl -X POST http://localhost:5000/convert \ # -H "Content-Type: application/json" \ # -d '{"text": "鼠标", "conversion": "s2t"}' # Response: {"original": "鼠标", "converted": "鼠標", "conversion": "s2t"} if __name__ == '__main__': app.run(debug=True) ``` -------------------------------- ### Python: Convert Chinese Characters with OpenCC Source: https://github.com/yichen0831/opencc-python/blob/master/README.md This Python code snippet demonstrates how to use the OpenCC library to convert Chinese characters from Simplified to Traditional Chinese. It initializes the OpenCC converter with a specific configuration ('s2t') and then uses the convert method to perform the conversion. Ensure the 'opencc' library is installed. ```python from opencc import OpenCC cc = OpenCC('s2t') # convert from Simplified Chinese to Traditional Chinese # can also set conversion by calling set_conversion # cc.set_conversion('s2tw') to_convert = '开放中文转换' converted = cc.convert(to_convert) ``` -------------------------------- ### Inspecting opencc-python Conversion Configuration and Benchmarking Source: https://context7.com/yichen0831/opencc-python/llms.txt This Python script demonstrates how to inspect the internal dictionary structure and conversion chains for a given mode using `opencc-python`. It also includes a function to benchmark the performance of text conversions over a specified number of iterations, providing average time and throughput. ```python from opencc import OpenCC import json import os # Example: Inspect conversion configuration def inspect_conversion_config(conversion_mode): """Display the dictionary chain for a conversion mode""" config_path = os.path.join( os.path.dirname(__file__), 'opencc/config', f'{conversion_mode}.json' ) with open(config_path, 'r', encoding='utf-8') as f: config = json.load(f) print(f"Conversion: {config['name']}") print(f"Mode: {conversion_mode}") print("\nDictionary Chain:") for i, chain in enumerate(config['conversion_chain']): print(f" Step {i+1}:") dict_info = chain['dict'] if dict_info['type'] == 'group': print(f" Type: Dictionary Group") for j, d in enumerate(dict_info['dicts']): print(f" {j+1}. {d['file']}") else: print(f" Type: Single Dictionary") print(f" File: {dict_info['file']}") # Usage inspect_conversion_config('s2t') # Output: # Conversion: Simplified Chinese to Traditional Chinese # Mode: s2t # Dictionary Chain: # Step 1: # Type: Dictionary Group # 1. STPhrases.txt # 2. STCharacters.txt # Example: Performance measurement import time def benchmark_conversion(text, conversion, iterations=1000): """Measure conversion performance""" cc = OpenCC(conversion) # Warm-up (dictionary loading) cc.convert(text) # Benchmark start = time.time() for _ in range(iterations): result = cc.convert(text) end = time.time() avg_time = (end - start) / iterations * 1000 # milliseconds print(f"Conversion: {conversion}") print(f"Text length: {len(text)} characters") print(f"Average time: {avg_time:.3f} ms per conversion") print(f"Throughput: {len(text) * iterations / (end - start):.0f} chars/sec") sample_text = '鼠标是一种很常见及常用的电脑输入设备' * 10 benchmark_conversion(sample_text, 's2t', iterations=100) ``` -------------------------------- ### Batch Convert Files and Lists with OpenCC Python Source: https://context7.com/yichen0831/opencc-python/llms.txt Demonstrates how to perform batch conversions of text files within directories and process lists of strings using the OpenCC library. It handles file system operations, encoding, and iterates through multiple files or list items for conversion. ```python from opencc import OpenCC import os # Example: Convert multiple files in a directory def batch_convert_files(input_dir, output_dir, conversion='s2t'): """Convert all .txt files in input_dir to specified Chinese variant""" cc = OpenCC(conversion) if not os.path.exists(output_dir): os.makedirs(output_dir) for filename in os.listdir(input_dir): if filename.endswith('.txt'): input_path = os.path.join(input_dir, filename) output_path = os.path.join(output_dir, filename) with open(input_path, 'r', encoding='utf-8') as f: content = f.read() converted = cc.convert(content) with open(output_path, 'w', encoding='utf-8') as f: f.write(converted) print(f"Converted: {filename}") # Usage batch_convert_files('./simplified_docs', './traditional_docs', 's2tw') # Example: Process list of strings def convert_list(items, conversion='s2t'): """Convert a list of Chinese text strings""" cc = OpenCC(conversion) return [cc.convert(item) for item in items] chinese_items = [ '计算机科学', '人工智能', '机器学习', '数据结构' ] converted_items = convert_list(chinese_items, 's2tw') for original, converted in zip(chinese_items, converted_items): print(f"{original} → {converted}") # Output: # 计算机科学 → 計算機科學 # 人工智能 → 人工智慧 # 机器学习 → 機器學習 # 数据结构 → 資料結構 ``` -------------------------------- ### Initialize OpenCC Converter (Python) Source: https://context7.com/yichen0831/opencc-python/llms.txt Demonstrates how to initialize the OpenCC converter class in Python. It shows initializing with a specific conversion mode or without one, allowing the mode to be set later using `set_conversion()`. ```python from opencc import OpenCC # Initialize with conversion mode cc = OpenCC('s2t') # Simplified to Traditional Chinese # Initialize without mode (set later) cc = OpenCC() cc.set_conversion('s2tw') # Simplified to Traditional Chinese (Taiwan) # Available conversion modes: # 'hk2s' - Traditional Chinese (Hong Kong) to Simplified # 's2hk' - Simplified to Traditional Chinese (Hong Kong) # 's2t' - Simplified to Traditional Chinese # 's2tw' - Simplified to Traditional Chinese (Taiwan) # 's2twp' - Simplified to Traditional Chinese (Taiwan with phrases) # 't2hk' - Traditional to Traditional Chinese (Hong Kong) # 't2s' - Traditional to Simplified Chinese # 't2tw' - Traditional to Traditional Chinese (Taiwan) # 'tw2s' - Traditional Chinese (Taiwan) to Simplified # 'tw2sp' - Traditional Chinese (Taiwan) to Simplified with phrases ``` -------------------------------- ### Unit Testing Chinese Text Conversion with opencc-python Source: https://context7.com/yichen0831/opencc-python/llms.txt This Python script uses the `unittest` module to test various Chinese text conversion functionalities provided by the `opencc` library. It covers conversions like Simplified to Traditional, Taiwan phrases, multiple conversion modes, empty strings, extended Unicode, and uninitialized converters. It also tests the selection of the first mapping when multiple conversions are possible. ```python import unittest from opencc import OpenCC class ChineseConversionTest(unittest.TestCase): def setUp(self): """Initialize converter before each test""" self.cc = OpenCC() def test_simplified_to_traditional(self): """Test basic Simplified to Traditional conversion""" self.cc.set_conversion('s2t') input_text = '香烟为烟草制品。鼠标是电脑输入设备。' expected = '香菸爲菸草製品。鼠標是電腦輸入設備。' result = self.cc.convert(input_text) self.assertEqual(result, expected) def test_taiwan_phrase_conversion(self): """Test Taiwan variant with phrase-level conversion""" self.cc.set_conversion('s2twp') input_text = '內存是一种常用的电脑输入设备。' expected = '記憶體是一種常用的電腦輸入裝置。' result = self.cc.convert(input_text) self.assertEqual(result, expected) def test_multiple_conversions(self): """Test switching between different conversion modes""" # First conversion self.cc.set_conversion('hk2s') hk_text = '香煙為煙草製品。滑鼠是電腦輸入設備。' result1 = self.cc.convert(hk_text) expected1 = '香烟为烟草制品。滑鼠是电脑输入设备。' self.assertEqual(result1, expected1) # Switch mode self.cc.set_conversion('s2t') simp_text = '香烟为烟草制品。鼠标是电脑输入设备。' result2 = self.cc.convert(simp_text) expected2 = '香菸爲菸草製品。鼠標是電腦輸入設備。' self.assertEqual(result2, expected2) def test_empty_string(self): """Test handling of empty input""" self.cc.set_conversion('t2s') result = self.cc.convert('') self.assertEqual(result, '') def test_extended_unicode(self): """Test handling of extended Unicode characters""" self.cc.set_conversion('t2s') # Test extended Unicode character (U+20045E) input_text = '𠁞种' expected = '𠀾种' result = self.cc.convert(input_text) self.assertEqual(result, expected) def test_uninitialized_converter(self): """Test that uninitialized converter raises ValueError""" with self.assertRaises(ValueError): uninitialized = OpenCC() # No conversion set uninitialized.convert('test text') def test_first_mapping_selection(self): """Test that first mapping is chosen when multiple exist""" self.cc.set_conversion('t2s') # Character 儘 has two possible conversions: 尽 侭 # First one (尽) should be selected input_text = '儘' expected = '尽' result = self.cc.convert(input_text) self.assertEqual(result, expected) if __name__ == '__main__': unittest.main() ``` -------------------------------- ### Python API - Chinese Character Conversion Source: https://github.com/yichen0831/opencc-python/blob/master/opencc/README.md Demonstrates basic usage of the OpenCC Python library for converting between Simplified and Traditional Chinese. Shows initialization with conversion mode, conversion execution, and alternative configuration methods. Supports Python 2.7 and 3.x. ```python from opencc import OpenCC cc = OpenCC('s2t') # convert from Simplified Chinese to Traditional Chinese # can also set conversion by calling set_conversion # cc.set_conversion('s2tw') to_convert = '开放中文转换' converted = cc.convert(to_convert) ``` -------------------------------- ### POST /convert (Web API Endpoint) Source: https://context7.com/yichen0831/opencc-python/llms.txt This endpoint provides real-time Chinese text conversion through a web service. It accepts a POST request with a JSON payload containing the text to be converted and the desired conversion mode. ```APIDOC ## POST /convert ### Description This API endpoint converts Chinese text to a specified variant. ### Method POST ### Endpoint /convert ### Parameters #### Request Body - **text** (string) - Required - The Chinese text to convert. - **conversion** (string) - Optional - The conversion mode (e.g., s2t, s2tw). Defaults to 's2t'. #### Query Parameters None ### Request Example { "text": "开放中文转换", "conversion": "s2t" } ### Response #### Success Response (200) - **original** (string) - The original Chinese text. - **converted** (string) - The converted Chinese text. - **conversion** (string) - The conversion mode used. #### Response Example { "original": "开放中文转换", "converted": "開放中文轉換", "conversion": "s2t" } #### Error Response (400) - **error** (string) - An error message indicating an invalid conversion mode. - **valid_modes** (array) - A list of valid conversion modes. #### Error Response Example { "error": "Invalid conversion mode: invalid_mode", "valid_modes": ["s2t", "s2tw", "s2twp", "t2s", "tw2s", "hk2s", "s2hk"] } ### Error Handling - Returns a 400 error for invalid conversion modes. - Returns a 500 error for other exceptions. ``` -------------------------------- ### Command Line - File Conversion Interface Source: https://github.com/yichen0831/opencc-python/blob/master/opencc/README.md Shows command line interface usage for converting text files between Chinese variants. Demonstrates input/output file handling, encoding options, and conversion mode selection. Supports UTF-8 and other standard encodings with flexible file I/O. ```shell python -m opencc [-h] [-i ] [-o ] [-c ] [--in-enc ] [--out-enc ] optional arguments: -h, --help show this help message and exit -i , --input Read original text from . (default: None = STDIN) -o , --output Write converted text to . (default: None = STDOUT) -c , --config Conversion (default: None) --in-enc Encoding for input (default: UTF-8) --out-enc Encoding for output (default: UTF-8) example with UTF-8 encoded file: python -m opencc -c s2t -i my_simplified_input_file.txt -o my_traditional_output_file.txt ``` -------------------------------- ### Dynamically Change Conversion Mode (Python) Source: https://context7.com/yichen0831/opencc-python/llms.txt Illustrates how to change the conversion mode of an existing OpenCC converter instance using the `set_conversion()` method. This allows reusing a single converter object for multiple different conversion tasks efficiently. ```python from opencc import OpenCC # Initialize once, use multiple times with different modes cc = OpenCC() # Convert Hong Kong Traditional to Simplified cc.set_conversion('hk2s') hk_text = '香煙(英語:Cigarette),為煙草製品的一種。滑鼠是電腦輸入設備。' result1 = cc.convert(hk_text) print(result1) # Output: 香烟(英语:Cigarette),为烟草制品的一种。滑鼠是电脑输入设备。 # Switch to Simplified to Traditional cc.set_conversion('s2t') simp_text = '香烟为烟草制品。鼠标是电脑输入设备。' result2 = cc.convert(simp_text) print(result2) # Output: 香菸爲菸草製品。鼠標是電腦輸入設備。 # Switch to Taiwan Traditional conversion cc.set_conversion('s2tw') text = '鼠标是一种很常见的电脑输入设备' result3 = cc.convert(text) print(result3) # Output: 鼠標是一種很常見的電腦輸入設備 ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.