### Build Chinese Full-Text Search Application Source: https://context7.com/amutu/zhparser/llms.txt This SQL example demonstrates setting up a full-text search application with zhparser. It includes creating an extension, a text search configuration, a documents table with a generated tsvector column, a GIN index, and sample queries for ranking, phrase, and boolean searches. ```sql -- Setup: Create extension and configuration CREATE EXTENSION IF NOT EXISTS zhparser; CREATE TEXT SEARCH CONFIGURATION chinese_fts (PARSER = zhparser); ALTER TEXT SEARCH CONFIGURATION chinese_fts ADD MAPPING FOR n,v,a,i,e,l,j,m,q,r WITH simple; -- Create documents table with full-text search support CREATE TABLE documents ( id SERIAL PRIMARY KEY, title TEXT NOT NULL, content TEXT NOT NULL, created_at TIMESTAMP DEFAULT NOW(), tsv tsvector GENERATED ALWAYS AS ( setweight(to_tsvector('chinese_fts', coalesce(title, '')), 'A') || setweight(to_tsvector('chinese_fts', coalesce(content, '')), 'B') ) STORED ); -- Create GIN index for fast searches CREATE INDEX idx_documents_tsv ON documents USING GIN(tsv); -- Insert sample documents INSERT INTO documents (title, content) VALUES ('保障房政策解读', '2024年全国保障房建设将继续加速推进,各地政府加大资金投入力度。'), ('人工智能发展报告', '人工智能技术在医疗、教育、金融等领域取得重大突破,推动产业升级。'), ('房地产市场分析', '当前房价保持稳定,二手房市场交易活跃,购房者信心逐步恢复。'); -- Search with ranking SELECT id, title, ts_rank(tsv, query) AS rank, ts_headline('chinese_fts', content, query, 'StartSel=, StopSel=, MaxWords=50') AS snippet FROM documents, plainto_tsquery('chinese_fts', '房价 保障') AS query WHERE tsv @@ query ORDER BY rank DESC; -- Output: -- id | title | rank | snippet -- ----+-----------------+------------+----------------------------------------------- -- 3 | 房地产市场分析 | 0.09910322 | 当前房价持稳定,二手市场... -- 1 | 保障房政策解读 | 0.06079271 | 2024年全国保障建设将继续加速... -- Phrase search (words must be adjacent) SELECT title, content FROM documents WHERE tsv @@ phraseto_tsquery('chinese_fts', '人工智能技术'); -- Boolean search SELECT title FROM documents WHERE tsv @@ to_tsquery('chinese_fts', '房价 & !保障'); ``` -------------------------------- ### Run Zhparser with Docker Source: https://context7.com/amutu/zhparser/llms.txt This bash script demonstrates how to quickly set up and run PostgreSQL with zhparser pre-installed using Docker. It includes commands to start the container, connect to the database, and create a basic text search configuration. ```bash # Run PostgreSQL with zhparser docker run --name pgzhparser -d \ -e POSTGRES_PASSWORD=mypassword \ zhparser/zhparser:bookworm-16 # Connect to the database docker exec -it pgzhparser psql -U postgres # Inside psql, create and test the extension CREATE EXTENSION zhparser; CREATE TEXT SEARCH CONFIGURATION testcfg (PARSER = zhparser); ALTER TEXT SEARCH CONFIGURATION testcfg ADD MAPPING FOR n,v,a,i,e,l WITH simple; SELECT * FROM ts_parse('zhparser', '中文全文搜索测试'); -- Output shows segmented Chinese words # Available Docker image tags: # zhparser/zhparser:bookworm-16 (Debian Bookworm, PostgreSQL 16) # zhparser/zhparser:bookworm-15 (Debian Bookworm, PostgreSQL 15) # zhparser/zhparser:bullseye-16 (Debian Bullseye, PostgreSQL 16) # zhparser/zhparser:alpine-16 (Alpine Linux, PostgreSQL 16) ``` -------------------------------- ### to_tsquery - Convert Search Query to tsquery Source: https://context7.com/amutu/zhparser/llms.txt Converts a Chinese search query into a `tsquery` data type for matching against tsvector columns. Includes examples for `to_tsquery`, `plainto_tsquery`, `phraseto_tsquery`, and ranking. ```APIDOC ## to_tsquery - Convert Search Query to tsquery Converts a Chinese search query into a `tsquery` data type for matching against tsvector columns. ### Request Example ```sql -- Convert Chinese query string to tsquery SELECT to_tsquery('chinese_config', '保障房资金压力'); -- Output: -- to_tsquery -- --------------------------------------- -- '保障' <-> '房' <-> '资金' <-> '压力' -- (1 row) -- Full-text search with tsquery SELECT title, content FROM articles WHERE content_tsv @@ to_tsquery('chinese_config', '保障房建设'); -- Using plainto_tsquery for natural language queries (no operators) SELECT * FROM articles WHERE content_tsv @@ plainto_tsquery('chinese_config', '人工智能 技术'); -- Using phraseto_tsquery for exact phrase matching SELECT * FROM articles WHERE content_tsv @@ phraseto_tsquery('chinese_config', '保障房建设'); -- Combine with ranking for relevance sorting SELECT title, ts_rank(content_tsv, query) AS rank FROM articles, to_tsquery('chinese_config', '房价|保障') AS query WHERE content_tsv @@ query ORDER BY rank DESC; ``` ``` -------------------------------- ### Create and Verify Zhparser Extension Source: https://context7.com/amutu/zhparser/llms.txt Initializes the extension in the database and confirms its availability. ```sql -- Create the zhparser extension (requires superuser privileges) CREATE EXTENSION zhparser; -- Verify installation by checking available parsers SELECT prsname FROM pg_ts_parser WHERE prsname = 'zhparser'; -- Output: -- prsname -- ---------- -- zhparser -- (1 row) ``` -------------------------------- ### Create Extension Source: https://context7.com/amutu/zhparser/llms.txt This section details how to create and enable the zhparser extension in a PostgreSQL database. This is a prerequisite for using any zhparser functionalities. ```APIDOC ## Create Extension Creates and enables the zhparser extension in a PostgreSQL database. This is the first step required before using any zhparser functionality. ### Method SQL ### Endpoint N/A ### Parameters None ### Request Example ```sql -- Create the zhparser extension (requires superuser privileges) CREATE EXTENSION zhparser; -- Verify installation by checking available parsers SELECT prsname FROM pg_ts_parser WHERE prsname = 'zhparser'; ``` ### Response #### Success Response (200) Output of the verification query: ``` prsname ---------- zhparser (1 row) ``` ``` -------------------------------- ### Convert Text to tsvector and Indexing Source: https://context7.com/amutu/zhparser/llms.txt Demonstrates converting text to tsvector for search and setting up a GIN index on a table. ```sql -- Create tsvector from Chinese text SELECT to_tsvector('chinese_config', '今年保障房新开工数量虽然有所下调,但实际的年度在建规模会超以往年份。'); -- Output: -- to_tsvector -- ----------------------------------------------------------------------------------------- -- '上':35 '下调':7 '严峻':37 '会':14 '保障':1,30 '历史':21 '压力':36 ... -- (1 row) -- Create a searchable table with Chinese content CREATE TABLE articles ( id SERIAL PRIMARY KEY, title TEXT, content TEXT, content_tsv tsvector ); -- Insert sample data and generate tsvector INSERT INTO articles (title, content) VALUES ('房价新闻', '2024年全国房价保持稳定,保障房建设加速推进。'), ('科技报道', '人工智能技术在医疗领域取得重大突破。'); -- Update tsvector column UPDATE articles SET content_tsv = to_tsvector('chinese_config', content); -- Create GIN index for fast full-text search CREATE INDEX idx_articles_content ON articles USING GIN(content_tsv); ``` -------------------------------- ### Configure Extra Dictionaries Source: https://context7.com/amutu/zhparser/llms.txt Extend Zhparser's vocabulary by configuring extra dictionaries. These can be specified in `postgresql.conf` or per-database, supporting `.txt` and `.xdb` formats. ```sql -- Configure extra dictionaries in postgresql.conf (requires reload) -- zhparser.extra_dicts = 'dict_extra.txt,medical_terms.xdb' ``` ```sql -- Or set for specific databases ALTER DATABASE mydb SET zhparser.extra_dicts = 'dict_extra.txt'; ``` ```sql -- Dictionary priority: lower to higher (rightmost has highest priority) -- Supported formats: -- .txt - Text format (dynamically loaded, auto-converted to xdb) -- .xdb - Binary format (more efficient for large dictionaries) -- Text dictionary format (dict_extra.txt): -- Each line: word [TF] [IDF] [attr] -- Example content: -- 我是新增词 2.0 -- 再试一个 1.0 1.0 @ -- 删除 1.0 1.0 ! ; '!' marks word as invalid/deleted ``` -------------------------------- ### Configure Text Search for Chinese Source: https://context7.com/amutu/zhparser/llms.txt Sets up a text search configuration using the zhparser and defines token type mappings for indexing. ```sql -- Create a new text search configuration using zhparser CREATE TEXT SEARCH CONFIGURATION chinese_config (PARSER = zhparser); -- Add token type mappings (n=noun, v=verb, a=adjective, i=idiom, e=exclamation, l=tmp/idiom) -- 'simple' dictionary converts tokens to lowercase ALTER TEXT SEARCH CONFIGURATION chinese_config ADD MAPPING FOR n,v,a,i,e,l WITH simple; -- Verify the configuration SELECT cfgname FROM pg_ts_config WHERE cfgname = 'chinese_config'; -- Output: -- cfgname -- --------------- -- chinese_config -- (1 row) -- Alternative: Add all common token types for comprehensive indexing ALTER TEXT SEARCH CONFIGURATION chinese_config ADD MAPPING FOR n,v,a,i,e,l,j,m,q,r,s,t,x WITH simple; ``` -------------------------------- ### Create Text Search Configuration Source: https://context7.com/amutu/zhparser/llms.txt This section explains how to create a text search configuration that utilizes zhparser. It also covers adding token type mappings for indexing and normalization. ```APIDOC ## Create Text Search Configuration Creates a text search configuration that uses zhparser as its parser. Token type mappings must be added to specify which token types to index and which dictionaries to use for normalization. ### Method SQL ### Endpoint N/A ### Parameters None ### Request Example ```sql -- Create a new text search configuration using zhparser CREATE TEXT SEARCH CONFIGURATION chinese_config (PARSER = zhparser); -- Add token type mappings (n=noun, v=verb, a=adjective, i=idiom, e=exclamation, l=tmp/idiom) -- 'simple' dictionary converts tokens to lowercase ALTER TEXT SEARCH CONFIGURATION chinese_config ADD MAPPING FOR n,v,a,i,e,l WITH simple; -- Verify the configuration SELECT cfgname FROM pg_ts_config WHERE cfgname = 'chinese_config'; -- Alternative: Add all common token types for comprehensive indexing ALTER TEXT SEARCH CONFIGURATION chinese_config ADD MAPPING FOR n,v,a,i,e,l,j,m,q,r,s,t,x WITH simple; ``` ### Response #### Success Response (200) Output of the verification query: ``` cfgname --------------- chinese_config (1 row) ``` ``` -------------------------------- ### Configuration Options Source: https://context7.com/amutu/zhparser/llms.txt Zhparser provides several GUC (Grand Unified Configuration) parameters to control segmentation behavior. These can be set at session or database level. ```APIDOC ## Configuration Options Zhparser provides several GUC (Grand Unified Configuration) parameters to control segmentation behavior. These can be set at session or database level. ### Request Example ```sql -- Ignore punctuation marks (except \r and \n) SET zhparser.punctuation_ignore = true; -- Enable duality segmentation for loose characters SET zhparser.seg_with_duality = true; -- Multi-word segmentation options (return multiple possible segmentations) SET zhparser.multi_short = true; -- Prefer short words SET zhparser.multi_duality = true; -- Prefer duality SET zhparser.multi_zmain = true; -- Prefer most important element SET zhparser.multi_zall = true; -- Return all elements -- Load dictionaries into memory for better performance (requires backend restart) -- Set in postgresql.conf: -- zhparser.dict_in_memory = true -- Example: Compare segmentation with different settings SET zhparser.punctuation_ignore = false; SELECT * FROM ts_parse('zhparser', '你好,世界!'); -- Shows punctuation tokens SET zhparser.punctuation_ignore = true; SELECT * FROM ts_parse('zhparser', '你好,世界!'); -- Punctuation tokens are filtered out -- Reset to defaults RESET zhparser.punctuation_ignore; RESET zhparser.seg_with_duality; ``` ``` -------------------------------- ### Extra Dictionaries Configuration Source: https://context7.com/amutu/zhparser/llms.txt Load additional custom dictionaries for specialized vocabulary. Dictionary files must be placed in the `share/tsearch_data` directory. ```APIDOC ## Extra Dictionaries Configuration Load additional custom dictionaries for specialized vocabulary. Dictionary files must be placed in the `share/tsearch_data` directory. ### Request Example ```sql -- Configure extra dictionaries in postgresql.conf (requires reload) -- zhparser.extra_dicts = 'dict_extra.txt,medical_terms.xdb' -- Or set for specific databases ALTER DATABASE mydb SET zhparser.extra_dicts = 'dict_extra.txt'; -- Dictionary priority: lower to higher (rightmost has highest priority) -- Supported formats: -- .txt - Text format (dynamically loaded, auto-converted to xdb) -- .xdb - Binary format (more efficient for large dictionaries) -- Text dictionary format (dict_extra.txt): -- Each line: word [TF] [IDF] [attr] -- Example content: -- 我是新增词 2.0 -- 再试一个 1.0 1.0 @ -- 删除 1.0 1.0 ! ; '!' marks word as invalid/deleted ``` ``` -------------------------------- ### Manage Custom Dictionary Table (Zhparser 2.0+) Source: https://context7.com/amutu/zhparser/llms.txt Utilize the `zhprs_custom_word` table to dynamically manage custom words, including compound words and TF-IDF weights. Changes require syncing and reconnecting to the database. ```sql -- View custom word table structure \d zhparser.zhprs_custom_word -- Table "zhparser.zhprs_custom_word" -- Column | Type | Collation | Nullable | Default -- --------+------------------+-----------+----------+--------- -- word | text | | not null | -- tf | double precision | | | 1.0 -- idf | double precision | | | 1.0 -- attr | character(1) | | | '@' ``` ```sql -- Test before adding custom word SELECT * FROM ts_parse('zhparser', '保障房资金压力'); -- Output shows '资金' and '压力' as separate tokens ``` ```sql -- Add a custom compound word INSERT INTO zhparser.zhprs_custom_word (word) VALUES ('资金压力'); ``` ```sql -- Add word with custom TF-IDF weights INSERT INTO zhparser.zhprs_custom_word (word, tf, idf, attr) VALUES ('互联网金融', 2.5, 3.0, '@'); ``` ```sql -- Mark a word as invalid (will be ignored in segmentation) INSERT INTO zhparser.zhprs_custom_word (word, attr) VALUES ('删除词', '!'); ``` ```sql -- Sync custom words to disk (required for changes to take effect) SELECT sync_zhprs_custom_word(); ``` ```sql -- IMPORTANT: Reconnect to the database after sync for changes to apply \q psql -d mydb ``` ```sql -- Verify custom word is now recognized SELECT * FROM ts_parse('zhparser', '保障房资金压力'); -- Output now shows '资金压力' as a single token ``` ```sql -- View all custom words SELECT * FROM zhparser.zhprs_custom_word ORDER BY word; ``` ```sql -- Remove a custom word DELETE FROM zhparser.zhprs_custom_word WHERE word = '资金压力'; SELECT sync_zhprs_custom_word(); ``` -------------------------------- ### Configure Zhparser Segmentation Source: https://context7.com/amutu/zhparser/llms.txt Adjust Zhparser's behavior using GUC parameters like `punctuation_ignore` and `seg_with_duality`. These settings control how punctuation and character segmentation are handled. ```sql -- Ignore punctuation marks (except \r and \n) SET zhparser.punctuation_ignore = true; ``` ```sql -- Enable duality segmentation for loose characters SET zhparser.seg_with_duality = true; ``` ```sql -- Multi-word segmentation options (return multiple possible segmentations) SET zhparser.multi_short = true; -- Prefer short words SET zhparser.multi_duality = true; -- Prefer duality SET zhparser.multi_zmain = true; -- Prefer most important element SET zhparser.multi_zall = true; -- Return all elements ``` ```sql -- Load dictionaries into memory for better performance (requires backend restart) -- Set in postgresql.conf: -- zhparser.dict_in_memory = true ``` ```sql -- Example: Compare segmentation with different settings SET zhparser.punctuation_ignore = false; SELECT * FROM ts_parse('zhparser', '你好,世界!'); -- Shows punctuation tokens ``` ```sql SET zhparser.punctuation_ignore = true; SELECT * FROM ts_parse('zhparser', '你好,世界!'); -- Punctuation tokens are filtered out ``` ```sql -- Reset to defaults RESET zhparser.punctuation_ignore; RESET zhparser.seg_with_duality; ``` -------------------------------- ### Synchronize Custom Words with sync_zhprs_custom_word Source: https://github.com/amutu/zhparser/blob/master/README.md Execute the sync_zhprs_custom_word() function to synchronize custom words. A new connection is required after synchronization. ```sql select sync_zhprs_custom_word(); ``` -------------------------------- ### Custom Dictionary Table (Version 2.0+) Source: https://context7.com/amutu/zhparser/llms.txt Zhparser 2.0+ provides a database-level custom dictionary managed through a SQL table, allowing dynamic word additions without file management. ```APIDOC ## Custom Dictionary Table (Version 2.0+) Zhparser 2.0+ provides a database-level custom dictionary managed through a SQL table, allowing dynamic word additions without file management. ### Request Example ```sql -- View custom word table structure \d zhparser.zhprs_custom_word -- Table "zhparser.zhprs_custom_word" -- Column | Type | Collation | Nullable | Default -- --------+------------------+-----------+----------+--------- -- word | text | | not null | -- tf | double precision | | | 1.0 -- idf | double precision | | | 1.0 -- attr | character(1) | | | '@' -- Test before adding custom word SELECT * FROM ts_parse('zhparser', '保障房资金压力'); -- Output shows '资金' and '压力' as separate tokens -- Add a custom compound word INSERT INTO zhparser.zhprs_custom_word (word) VALUES ('资金压力'); -- Add word with custom TF-IDF weights INSERT INTO zhparser.zhprs_custom_word (word, tf, idf, attr) VALUES ('互联网金融', 2.5, 3.0, '@'); -- Mark a word as invalid (will be ignored in segmentation) INSERT INTO zhparser.zhprs_custom_word (word, attr) VALUES ('删除词', '!'); -- Sync custom words to disk (required for changes to take effect) SELECT sync_zhprs_custom_word(); -- IMPORTANT: Reconnect to the database after sync for changes to apply \q psql -d mydb -- Verify custom word is now recognized SELECT * FROM ts_parse('zhparser', '保障房资金压力'); -- Output now shows '资金压力' as a single token -- View all custom words SELECT * FROM zhparser.zhprs_custom_word ORDER BY word; -- Remove a custom word DELETE FROM zhparser.zhprs_custom_word WHERE word = '资金压力'; SELECT sync_zhprs_custom_word(); ``` ``` -------------------------------- ### View Zhparser Token Types Source: https://context7.com/amutu/zhparser/llms.txt Use this SQL query to view all available token types recognized by the zhparser. These codes are essential for configuring text search mappings. ```sql -- View all available token types SELECT * FROM ts_token_type('zhparser'); -- Output: -- tokid | alias | description -- -------+-------+------------------------------------ -- 97 | a | adjective,形容词 -- 98 | b | differentiation,区别词 -- 99 | c | conjunction,连词 -- 100 | d | adverb,副词 -- 101 | e | exclamation,感叹词 -- 102 | f | position,方位词 -- 103 | g | root,词根 -- 104 | h | head,前连接成分 -- 105 | i | idiom,成语 -- 106 | j | abbreviation,简称 -- 107 | k | tail,后连接成分 -- 108 | l | tmp,习用语 -- 109 | m | numeral,数词 -- 110 | n | noun,名词 -- 111 | o | onomatopoeia,拟声词 -- 112 | p | prepositional,介词 -- 113 | q | quantity,量词 -- 114 | r | pronoun,代词 -- 115 | s | space,处所词 -- 116 | t | time,时语素 -- 117 | u | auxiliary,助词 -- 118 | v | verb,动词 -- 119 | w | punctuation,标点符号 -- 120 | x | unknown,未知词 -- 121 | y | modal,语气词 -- 122 | z | status,状态词 -- Create configuration with specific token types for indexing CREATE TEXT SEARCH CONFIGURATION full_chinese (PARSER = zhparser); -- Index content words only ALTER TEXT SEARCH CONFIGURATION full_chinese ADD MAPPING FOR n,v,a,i,e,l,j,m,x WITH simple; -- Ignore function words -- (c=conjunction, p=preposition, u=auxiliary, y=modal not mapped) ``` -------------------------------- ### Convert Chinese Query to tsquery Source: https://context7.com/amutu/zhparser/llms.txt Use `to_tsquery` to convert Chinese search terms into a tsquery data type for full-text search. This function is essential for matching text against tsvector columns. ```sql -- Convert Chinese query string to tsquery SELECT to_tsquery('chinese_config', '保障房资金压力'); -- Output: -- to_tsquery -- --------------------------------------- -- '保障' <-> '房' <-> '资金' <-> '压力' -- (1 row) ``` ```sql -- Full-text search with tsquery SELECT title, content FROM articles WHERE content_tsv @@ to_tsquery('chinese_config', '保障房建设'); ``` ```sql -- Using plainto_tsquery for natural language queries (no operators) SELECT * FROM articles WHERE content_tsv @@ plainto_tsquery('chinese_config', '人工智能 技术'); ``` ```sql -- Using phraseto_tsquery for exact phrase matching SELECT * FROM articles WHERE content_tsv @@ phraseto_tsquery('chinese_config', '保障房建设'); ``` ```sql -- Combine with ranking for relevance sorting SELECT title, ts_rank(content_tsv, query) AS rank FROM articles, to_tsquery('chinese_config', '房价|保障') AS query WHERE content_tsv @@ query ORDER BY rank DESC; ``` -------------------------------- ### Insert Custom Word into zhprs_custom_word Source: https://github.com/amutu/zhparser/blob/master/README.md Use this SQL command to insert a new custom word and its attribute into the zhprs_custom_word table. Ensure the table structure supports TD and IDF attributes. ```sql insert into zhprs_custom_word(word, attr) values('word', '!); ``` -------------------------------- ### to_tsvector - Convert Text to tsvector Source: https://context7.com/amutu/zhparser/llms.txt Converts Chinese text into a `tsvector` data type, which is optimized for full-text search indexing. The resulting tsvector includes normalized lexemes and their positions within the document. ```APIDOC ## to_tsvector - Convert Text to tsvector Converts Chinese text to a `tsvector` data type suitable for full-text search indexing. The tsvector contains normalized lexemes with their positions in the document. ### Method SQL ### Endpoint N/A ### Parameters - **config_name** (text search configuration) - Required - The name of the text search configuration to use (e.g., 'chinese_config'). - **document** (string) - Required - The Chinese text to convert. ### Request Example ```sql -- Create tsvector from Chinese text SELECT to_tsvector('chinese_config', '今年保障房新开工数量虽然有所下调,但实际的年度在建规模会超以往年份。'); -- Create a searchable table with Chinese content CREATE TABLE articles ( id SERIAL PRIMARY KEY, title TEXT, content TEXT, content_tsv tsvector ); -- Insert sample data and generate tsvector INSERT INTO articles (title, content) VALUES ('房价新闻', '2024年全国房价保持稳定,保障房建设加速推进。'), ('科技报道', '人工智能技术在医疗领域取得重大突破。'); -- Update tsvector column UPDATE articles SET content_tsv = to_tsvector('chinese_config', content); -- Create GIN index for fast full-text search CREATE INDEX idx_articles_content ON articles USING GIN(content_tsv); ``` ### Response #### Success Response (200) Returns the `tsvector` representation of the input text. #### Response Example ``` to_tsvector ----------------------------------------------------------------------------------------- '上':35 '下调':7 '严峻':37 '会':14 '保障':1,30 '历史':21 '压力':36 ... (1 row) ``` ``` -------------------------------- ### Parse Chinese Text with ts_parse Source: https://github.com/amutu/zhparser/blob/master/README.md Utilize the ts_parse function with the 'zhparser' language to tokenize Chinese text. This function breaks down a given string into its constituent tokens. ```sql SELECT * FROM ts_parse('zhparser', '保障房资金压力'); ``` -------------------------------- ### ts_parse - Parse Chinese Text into Tokens Source: https://context7.com/amutu/zhparser/llms.txt The `ts_parse` function is used to break down Chinese text into individual tokens, along with their corresponding token type IDs. This function is valuable for debugging and understanding the text segmentation process performed by zhparser. ```APIDOC ## ts_parse - Parse Chinese Text into Tokens The `ts_parse` function breaks Chinese text into individual tokens with their corresponding token type IDs. This is useful for debugging and understanding how zhparser segments text. ### Method SQL ### Endpoint N/A ### Parameters - **parser_name** (string) - Required - The name of the parser, typically 'zhparser'. - **text** (string) - Required - The Chinese text to parse. ### Request Example ```sql SELECT * FROM ts_parse('zhparser', 'hello world! 2010年保障房建设在全国范围内获全面启动'); ``` ### Response #### Success Response (200) Returns a set of `tokid` and `token` pairs. **Token type reference (tokid = ASCII value of type letter):** - 97 (a) = adjective - 118 (v) = verb - 113 (q) = quantity - 100 (d) = adverb - 110 (n) = noun - 117 (u) = auxiliary - 105 (i) = idiom - 101 (e) = exclamation #### Response Example ``` tokid | token -------+------- 101 | hello 101 | world 117 | ! 101 | 2010 113 | 年 118 | 保障 110 | 房建 118 | 设在 110 | 全国 110 | 范围 102 | 内 118 | 获 97 | 全面 118 | 启动 (14 rows) ``` ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.