=============== LIBRARY RULES =============== From library maintainers: - SafeText supports 13 languages: Arabic (ar), Azerbaijani (az), German (de), English (en), Spanish (es), Persian/Farsi (fa), French (fr), Hindi (hi), Japanese (ja), Portuguese (pt), Russian (ru), Turkish (tr), Chinese (zh) - Always initialize SafeText with a language code (ISO 639-1) or None for auto-detection - Use SafeText(language=None) and set_language_from_text() for automatic language detection - Use check_profanity() to get detailed profanity detection results with word positions - Example: st.check_profanity('bad text') returns [{'word': 'bad', 'index': 1, 'start': 0, 'end': 3}] - Use censor_profanity() to replace profanity with asterisks while preserving text length - Use custom_words_dir parameter to extend profanity lists with custom words - Custom words directory structure: custom_dir/{language_code}.txt (e.g., en.txt, tr.txt) - Custom word files contain one word/phrase per line and are combined with built-in lists - Example: SafeText('en', custom_words_dir='my_words') loads en.txt from my_words directory - Use whitelist parameter as a list of words or path to whitelist file to exclude specific words - Whitelist example with list: SafeText('en', whitelist=['damn', 'hell']) to allow these words - Whitelist example with file: SafeText('en', whitelist='allowed_words.txt') where file has one word per line - Whitelist example for phrases: SafeText('en', whitelist=['hot pocket', 'bloody mary']) to allow multi-word phrases - Whitelist is case-insensitive and supports both single words and phrases - Custom words and whitelist can be used together: SafeText('en', custom_words_dir='dir', whitelist=['word']) - Use get_bad_words() to get unique list of detected profanity without duplicates - For SRT subtitle files, use set_language_from_srt() for language detection - All profanity detection is case-insensitive by default - Use validate_profanity=True with MODERATE_CONTENT_API_KEY for API validation - Import as 'from safetext import SafeText' for main functionality - Profanity lists are stored in safetext/languages/{lang_code}/words.txt - Empty text returns empty results for check_profanity() and unchanged for censor_profanity() - Use pytest with -n auto for parallel test execution during development - Use uv for fast dependency management and development setup ### Install SafeText using pip Source: https://github.com/viddexa/safetext/blob/main/README.md Installs the SafeText library using pip, the Python package installer. This is the standard method for adding the library to your Python environment. ```bash pip install safetext ``` -------------------------------- ### Extend Profanity Lists with Custom Words in Python Source: https://github.com/viddexa/safetext/blob/main/README.md Shows how to enhance SafeText's profanity detection by providing custom words through a directory. The library scans specified text files (e.g., en.txt, tr.txt) for user-defined profanity terms. ```python # Directory structure: # custom_profanity_words/ #        ├── en.txt # English custom words #        ├── tr.txt # Turkish custom words #        └── es.txt # Spanish custom words st = SafeText(language='en', custom_words_dir='custom_profanity_words') results = st.check_profanity('This mycustomword is inappropriate') print(results) ``` ```text # custom_profanity_words/en.txt mycustomword inappropriate phrase company specific term ``` -------------------------------- ### Check and Censor Profanity in Python Source: https://github.com/viddexa/safetext/blob/main/README.md Demonstrates basic usage of SafeText for checking and censoring profanity in English text. It initializes the SafeText object, checks for profanity, and then censors it. Results include the detected word, its index, and character start/end positions. ```python from safetext import SafeText st = SafeText(language='en') results = st.check_profanity(text='Some text with .') print(results) text = st.censor_profanity(text='Some text with .') print(text) ``` -------------------------------- ### Use Whitelist for Profanity Exclusion in Python Source: https://github.com/viddexa/safetext/blob/main/README.md Illustrates how to configure SafeText to ignore specific words during profanity detection. Whitelisting can be done using a list of words directly or by referencing a file containing one word per line. It also shows combining custom words with a whitelist. ```python # Using a list of words st = SafeText(language='en', whitelist=['word1', 'word2']) # Using a file (one word per line) st = SafeText(language='en', whitelist='path/to/whitelist.txt') # Combining custom words with whitelist st = SafeText( language='en', custom_words_dir='custom_profanity_words', whitelist=['allowedcustomword'] ) ``` -------------------------------- ### Automated Language Detection from Text in Python Source: https://github.com/viddexa/safetext/blob/main/README.md Demonstrates SafeText's ability to automatically detect the language of a given text. By initializing SafeText with `language=None`, you can then use `set_language_from_text()` to determine and set the language, which is then accessible via the `st.language` attribute. ```python from safetext import SafeText eng_text = "This story is about to take a dark turn." st = SafeText(language=None) st.set_language_from_text(eng_text) print(st.language) ``` -------------------------------- ### Automated Language Detection from SRT File in Python Source: https://github.com/viddexa/safetext/blob/main/README.md Explains how SafeText can automatically detect the language from a subtitle (.srt) file. Similar to text-based detection, initialize with `language=None` and use `set_language_from_srt()` with the file path. The detected language is stored in `st.language`. ```python from safetext import SafeText turkish_srt_file_path = "turkish.srt" st = SafeText(language=None) st.set_language_from_srt(turkish_srt_file_path) print(st.language) ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.