### Getting Input File Block Start Offset Source: https://spark.apache.org/docs/latest/api/sql The `input_file_block_start` function returns the starting offset of the current block being read from an input file. It returns -1 if this information is not available. ```sql SELECT input_file_block_start(); ``` -------------------------------- ### Check if String Starts With Substring Source: https://spark.apache.org/docs/latest/api/sql Returns a boolean indicating if a string starts with a specified substring. Supports both string and binary types. Returns NULL if either input is NULL. ```sql SELECT startswith('Spark SQL', 'Spark'); ``` ```sql SELECT startswith('Spark SQL', 'SQL'); ``` ```sql SELECT startswith('Spark SQL', null); ``` ```sql SELECT startswith(x'537061726b2053514c', x'537061726b'); ``` ```sql SELECT startswith(x'537061726b2053514c', x'53514c'); ``` -------------------------------- ### Get Current Database Source: https://spark.apache.org/docs/latest/api/sql Returns the name of the current database in use. ```sql SELECT current_database(); ``` -------------------------------- ### ntile Window Function Example Source: https://spark.apache.org/docs/latest/api/sql Divides rows into a specified number of buckets within a partition. Useful for distributing data into quantiles. ```sql SELECT a, b, ntile(2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b); ``` -------------------------------- ### overlay Function Examples Source: https://spark.apache.org/docs/latest/api/sql Replaces a portion of a string with another string. Supports both string and byte-based operations. ```sql SELECT overlay('Spark SQL' PLACING '_' FROM 6); ``` ```sql SELECT overlay('Spark SQL' PLACING 'CORE' FROM 7); ``` ```sql SELECT overlay('Spark SQL' PLACING 'ANSI ' FROM 7 FOR 0); ``` ```sql SELECT overlay('Spark SQL' PLACING 'tructured' FROM 2 FOR 4); ``` ```sql SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('_', 'utf-8') FROM 6); ``` ```sql SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('CORE', 'utf-8') FROM 7); ``` ```sql SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('ANSI ', 'utf-8') FROM 7 FOR 0); ``` ```sql SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('tructured', 'utf-8') FROM 2 FOR 4); ``` -------------------------------- ### input_file_block_start Source: https://spark.apache.org/docs/latest/api/sql Returns the start offset of the current block being read from the input file. ```APIDOC ## input_file_block_start ### Description Returns the start offset of the block being read, or -1 if not available. ### Method N/A (SQL Function) ### Endpoint N/A (SQL Function) ### Parameters #### Path Parameters N/A #### Query Parameters N/A #### Request Body N/A ### Request Example ```sql SELECT input_file_block_start(); ``` ### Response #### Success Response (200) - **long**: The start offset of the input file block. #### Response Example ```json -1 ``` ``` -------------------------------- ### nvl Function Example Source: https://spark.apache.org/docs/latest/api/sql Replaces NULL values with a specified alternative. Use this to provide default values for nulls. ```sql SELECT nvl(NULL, array('2')); ``` -------------------------------- ### Find Start Position of Regex Match Source: https://spark.apache.org/docs/latest/api/sql Searches a string for a regular expression and returns the 1-based starting position of the first match. Returns 0 if no match is found. Raw string literals are recommended for regex patterns. ```sql SELECT regexp_instr(r"\abc", r"^\\abc$"); ``` ```sql SELECT regexp_instr('user@spark.apache.org', '@[^.]*'); ``` -------------------------------- ### Get Array Element by Index Source: https://spark.apache.org/docs/latest/api/sql Use `get` to retrieve an element from an array at a specific 0-based index. Returns NULL if the index is out of bounds. ```sql SELECT get(array(1, 2, 3), 0) ``` ```sql SELECT get(array(1, 2, 3), 3) ``` ```sql SELECT get(array(1, 2, 3), -1) ``` -------------------------------- ### Get Current Schema Source: https://spark.apache.org/docs/latest/api/sql Returns the current database schema name. ```sql > SELECT current_schema(); default ``` -------------------------------- ### Get Map Keys Source: https://spark.apache.org/docs/latest/api/sql Retrieves all keys from a map as an unordered array. ```sql SELECT map_keys(map(1, 'a', 2, 'b')); [1,2] ``` -------------------------------- ### nullifzero Function Example Source: https://spark.apache.org/docs/latest/api/sql Returns NULL if the expression is zero, otherwise returns the expression itself. Useful for handling zero values as null. ```sql SELECT nullifzero(0); ``` ```sql SELECT nullifzero(2); ``` -------------------------------- ### Get Current Catalog Source: https://spark.apache.org/docs/latest/api/sql Returns the name of the current catalog being used. ```sql SELECT current_catalog(); ``` -------------------------------- ### instr Source: https://spark.apache.org/docs/latest/api/sql Finds the starting index of a substring within a string. ```APIDOC ## instr ### Description Returns the (1-based) index of the first occurrence of `substr` in `str`. ### Method N/A (SQL Function) ### Endpoint N/A (SQL Function) ### Parameters #### Path Parameters N/A #### Query Parameters N/A #### Request Body N/A ### Request Example ```sql SELECT instr('SparkSQL', 'SQL'); ``` ### Response #### Success Response (200) - **integer**: The 1-based index of the first occurrence of the substring. #### Response Example ```json 6 ``` ``` -------------------------------- ### parse_url Function Examples Source: https://spark.apache.org/docs/latest/api/sql Extracts specific components from a URL. Useful for analyzing or manipulating URL strings. ```sql SELECT parse_url('http://spark.apache.org/path?query=1', 'HOST'); ``` ```sql SELECT parse_url('http://spark.apache.org/path?query=1', 'QUERY'); ``` ```sql SELECT parse_url('http://spark.apache.org/path?query=1', 'QUERY', 'query'); ``` -------------------------------- ### Get Current User Source: https://spark.apache.org/docs/latest/api/sql Returns the username of the current execution context. ```sql > SELECT current_user(); mockingjay ``` -------------------------------- ### date_add Source: https://spark.apache.org/docs/latest/api/sql Returns the date that is a specified number of days after a start date. ```APIDOC ## date_add ### Description Returns the date that is `num_days` after `start_date`. ### Syntax `date_add(start_date, num_days)` ### Arguments * `start_date` - The starting date. * `num_days` - The number of days to add. ### Examples ```sql > SELECT date_add('2016-07-30', 1); 2016-07-31 ``` ### Since 1.5.0 ``` -------------------------------- ### nvl2 Function Example Source: https://spark.apache.org/docs/latest/api/sql Returns one of two expressions based on whether the first expression is NULL or not. Use for conditional value selection. ```sql SELECT nvl2(NULL, 2, 1); ``` -------------------------------- ### overlay Source: https://spark.apache.org/docs/latest/api/sql Replaces a portion of a string with another string, starting at a specified position and length. ```APIDOC ## overlay ### Description Replaces `input` with `replace` that starts at `pos` and is of length `len`. ### Arguments * **input** - The original string. * **replace** - The string to insert. * **pos** - The starting position for the replacement. * **len** - Optional. The length of the substring to replace. ### Example ```sql SELECT overlay('Spark SQL' PLACING '_' FROM 6) SELECT overlay('Spark SQL' PLACING 'CORE' FROM 7) SELECT overlay('Spark SQL' PLACING 'ANSI ' FROM 7 FOR 0) SELECT overlay('Spark SQL' PLACING 'tructured' FROM 2 FOR 4) SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('_', 'utf-8') FROM 6) SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('CORE', 'utf-8') FROM 7) SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('ANSI ', 'utf-8') FROM 7 FOR 0) SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('tructured', 'utf-8') FROM 2 FOR 4) ``` ``` -------------------------------- ### nullif Function Example Source: https://spark.apache.org/docs/latest/api/sql Returns NULL if two expressions are equal, otherwise returns the first expression. Use for conditional null assignment. ```sql SELECT nullif(2, 2); ``` -------------------------------- ### Get Current Timestamp Source: https://spark.apache.org/docs/latest/api/sql Retrieves the current timestamp at the start of query evaluation. All calls within the same query return the same value. ```sql > SELECT current_timestamp(); 2020-04-25 15:49:11.914 ``` ```sql > SELECT current_timestamp; 2020-04-25 15:49:11.914 ``` -------------------------------- ### startswith Source: https://spark.apache.org/docs/latest/api/sql Checks if a string or binary value begins with a specified prefix. Returns true if it does, false otherwise, and NULL if either input is NULL. ```APIDOC ## startswith ### Description Checks if a string or binary value begins with a specified prefix. Returns true if it does, false otherwise, and NULL if either input is NULL. ### Method SQL Function ### Signature startswith(left, right) ### Arguments * **left** (STRING or BINARY) - The value to check. * **right** (STRING or BINARY) - The prefix to check for. ### Examples ```sql > SELECT startswith('Spark SQL', 'Spark'); true > SELECT startswith('Spark SQL', 'SQL'); false > SELECT startswith('Spark SQL', null); NULL > SELECT startswith(x'537061726b2053514c', x'537061726b'); true > SELECT startswith(x'537061726b2053514c', x'53514c'); false ``` ### Since 3.3.0 ``` -------------------------------- ### Convert to Protobuf with to_protobuf Source: https://spark.apache.org/docs/latest/api/sql Converts Catalyst binary input to Protobuf format. Requires message name, descriptor file path, and optional options. ```sql SELECT to_protobuf(s, 'Person', '/path/to/descriptor.desc', map('emitDefaultValues', 'true')) IS NULL FROM (SELECT NULL AS s); ``` -------------------------------- ### Get Current Date Source: https://spark.apache.org/docs/latest/api/sql Retrieves the current date at the start of query evaluation. All calls within the same query return the same value. ```sql > SELECT current_date(); 2020-04-25 ``` ```sql > SELECT current_date; 2020-04-25 ``` -------------------------------- ### time_diff(unit, start, end) Source: https://spark.apache.org/docs/latest/api/sql Gets the difference between two times in the specified units (HOUR, MINUTE, SECOND, MILLISECOND, MICROSECOND). ```APIDOC ## time_diff(unit, start, end) ### Description Gets the difference between two times in the specified units. ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Request Example None ### Response #### Success Response (200) - **result** (bigint) - The difference between the start and end times in the specified unit. ### Response Example ```sql > SELECT time_diff('HOUR', TIME'20:30:29', TIME'21:30:28'); 0 > SELECT time_diff('HOUR', TIME'20:30:29', TIME'21:30:29'); 1 > SELECT time_diff('HOUR', TIME'20:30:29', TIME'12:00:00'); -8 ``` ``` -------------------------------- ### Format String with Placeholders Source: https://spark.apache.org/docs/latest/api/sql Use `format_string` to create formatted strings using printf-style format specifiers. ```sql SELECT format_string("Hello World %d %s", 100, "days"); ``` -------------------------------- ### Get Current Local Timestamp Source: https://spark.apache.org/docs/latest/api/sql Returns the current timestamp without time zone at the start of query evaluation. All calls within the same query return the same value. ```sql > SELECT localtimestamp(); 2020-04-25 15:49:11.914 ``` -------------------------------- ### Convert Protobuf to Catalyst Value Source: https://spark.apache.org/docs/latest/api/sql Use `from_protobuf` to convert binary Protobuf data into a Catalyst value. Ensure the Protobuf schema matches the data; otherwise, behavior is undefined. Schema evolution can be handled via options. ```sql SELECT from_protobuf(s, 'Person', '/path/to/descriptor.desc', map()) IS NULL AS result FROM (SELECT NAMED_STRUCT('name', name, 'id', id) AS s FROM VALUES ('John Doe', 1), (NULL, 2) tab(name, id)) ``` -------------------------------- ### Get Current Time Source: https://spark.apache.org/docs/latest/api/sql Retrieves the current time at the start of query evaluation, with optional precision for fractional seconds. All calls within the same query return the same value. ```sql > SELECT current_time(); 15:49:11.914120 ``` ```sql > SELECT current_time; 15:49:11.914120 ``` ```sql > SELECT current_time(0); 15:49:11 ``` ```sql > SELECT current_time(3); 15:49:11.914 ``` ```sql > SELECT current_time(1+1); 15:49:11.91 ``` -------------------------------- ### Get Week of Year Source: https://spark.apache.org/docs/latest/api/sql Calculates the week of the year for a given date. A week starts on Monday, and week 1 is the first week with more than three days. Useful for temporal aggregations. ```sql SELECT weekofyear('2008-02-20'); ``` -------------------------------- ### percent_rank Window Function Example Source: https://spark.apache.org/docs/latest/api/sql Computes the percentage rank of a value within its partition. Use this to understand a value's relative position in a sorted dataset. ```sql SELECT a, b, percent_rank(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b); ``` -------------------------------- ### Find Next Day of Week Source: https://spark.apache.org/docs/latest/api/sql Returns the first date that is later than the start date and matches the specified day of the week. ```sql > SELECT next_day('2015-01-14', 'TU'); 2015-01-20 ``` -------------------------------- ### locate Source: https://spark.apache.org/docs/latest/api/sql Returns the position of the first occurrence of a substring within a string, optionally starting from a specified position. Both the starting position and the return value are 1-based. ```APIDOC ## locate ### Description Returns the position of the first occurrence of `substr` in `str` after position `pos`. The given `pos` and return value are 1-based. ### Syntax `locate(substr, str[, pos])` ### Parameters * `substr` (string) - The substring to search for. * `str` (string) - The string to search within. * `pos` (integer, optional) - The 1-based position to start the search from. ### Examples ```sql SELECT locate('bar', 'foobarbar'); -- Result: 4 SELECT locate('bar', 'foobarbar', 5); -- Result: 7 SELECT POSITION('bar' IN 'foobarbar'); -- Result: 4 ``` **Since:** 1.5.0 ``` -------------------------------- ### Create time from hour, minute, and second Source: https://spark.apache.org/docs/latest/api/sql Constructs a time value from hour, minute, and second components. Throws an error for invalid inputs. ```SQL SELECT make_time(6, 30, 45.887); 06:30:45.887 ``` ```SQL SELECT make_time(NULL, 30, 0); NULL ``` -------------------------------- ### dateadd Source: https://spark.apache.org/docs/latest/api/sql Calculates a future date by adding a specified number of days to a start date. The function takes the start date and the number of days to add as arguments. ```APIDOC ## dateadd(start_date, num_days) ### Description Returns the date that is `num_days` after `start_date`. ### Arguments * `start_date` - The starting date. * `num_days` - The number of days to add. ### Examples: ``` > SELECT dateadd('2016-07-30', 1); 2016-07-31 ``` **Since:** 3.4.0 ``` -------------------------------- ### octet_length Function Examples Source: https://spark.apache.org/docs/latest/api/sql Calculates the byte length of a string or binary data. Useful for determining storage size or data transfer size. ```sql SELECT octet_length('Spark SQL'); ``` ```sql SELECT octet_length(x'537061726b2053514c'); ``` -------------------------------- ### Mask String with Default Characters (Example 2) Source: https://spark.apache.org/docs/latest/api/sql Demonstrates default string masking behavior where uppercase letters become 'X', lowercase 'x', and digits 'n'. ```sql SELECT mask('AbCD123-@$#'); ``` -------------------------------- ### Array Slicing Source: https://spark.apache.org/docs/latest/api/sql Extracts a sub-array from a given array. Supports positive and negative start indices, with indices starting at 1. Negative indices count from the end of the array. ```sql SELECT slice(array(1, 2, 3, 4), 2, 2); [2,3] ``` ```sql SELECT slice(array(1, 2, 3, 4), -2, 2); [3,4] ``` -------------------------------- ### Create Map from Entries Source: https://spark.apache.org/docs/latest/api/sql Builds a map from an array of key-value entry structs. ```sql SELECT map_from_entries(array(struct(1, 'a'), struct(2, 'b'))); {1:"a",2:"b"} ``` -------------------------------- ### Logical OR Operator Example Source: https://spark.apache.org/docs/latest/api/sql Performs a logical OR operation. Returns true if either operand is true, and handles NULLs according to standard SQL logic. ```sql SELECT true or false; ``` ```sql SELECT false or false; ``` ```sql SELECT true or NULL; ``` ```sql SELECT false or NULL; ``` -------------------------------- ### Create Map from Arrays Source: https://spark.apache.org/docs/latest/api/sql Constructs a map by pairing elements from a key array and a value array. Keys must not be null. ```sql SELECT map_from_arrays(array(1.0, 3.0), array('2', '4')); {1.0:"2",3.0:"4"} ``` -------------------------------- ### inline Source: https://spark.apache.org/docs/latest/api/sql Explodes an array of structs into a table with default column names. ```APIDOC ## inline ### Description Explodes an array of structs into a table. Uses column names col1, col2, etc. by default unless specified otherwise. ### Method N/A (SQL Function) ### Endpoint N/A (SQL Function) ### Parameters #### Path Parameters N/A #### Query Parameters N/A #### Request Body N/A ### Request Example ```sql SELECT * FROM inline(array(struct(1, 'a'), struct(2, 'b'))); ``` ### Response #### Success Response (200) - **table**: A table with columns col1, col2, etc. representing the exploded structs. #### Response Example ```json [ {"col1": 1, "col2": "a"}, {"col1": 2, "col2": "b"} ] ``` ``` -------------------------------- ### Format string with printf Source: https://spark.apache.org/docs/latest/api/sql Returns a formatted string using printf-style format specifiers. Supports integers and strings. ```sql SELECT printf("Hello World %d %s", 100, "days"); Hello World 100 days ``` -------------------------------- ### Get Map Values Source: https://spark.apache.org/docs/latest/api/sql Retrieves all values from a map as an unordered array. ```sql SELECT map_values(map(1, 'a', 2, 'b')); ["a","b"] ``` -------------------------------- ### Get Day of Month Source: https://spark.apache.org/docs/latest/api/sql Returns the day of the month from a date or timestamp. ```sql SELECT day('2009-07-30'); 30 ``` -------------------------------- ### to_protobuf Source: https://spark.apache.org/docs/latest/api/sql Converts a Catalyst binary input value into its corresponding Protobuf format result using a descriptor file. ```APIDOC ## to_protobuf ### Description Converts a Catalyst binary input value into its corresponding Protobuf format result. ### Arguments * **child** (binary) - The Catalyst binary input value. * **messageName** (string) - The name of the Protobuf message. * **descFilePath** (string) - The path to the descriptor file. * **options** (map) - Optional. A map of options. ### Examples ```sql > SELECT to_protobuf(s, 'Person', '/path/to/descriptor.desc', map('emitDefaultValues', 'true')) IS NULL FROM (SELECT NULL AS s); [true] ``` **Since:** 4.0.0 ``` -------------------------------- ### Estimate Top K Items with approx_top_k_estimate Source: https://spark.apache.org/docs/latest/api/sql Returns the top k items with their frequency from a sketch. The k parameter is optional and defaults to 5. Ensure the input sketch is created using approx_top_k_accumulate. ```sql SELECT approx_top_k_estimate(approx_top_k_accumulate(expr)) FROM VALUES (0), (0), (1), (1), (2), (3), (4), (4) AS tab(expr); ``` ```sql SELECT approx_top_k_estimate(approx_top_k_accumulate(expr), 2) FROM VALUES 'a', 'b', 'c', 'c', 'c', 'c', 'd', 'd' tab(expr); ``` -------------------------------- ### Safe binary conversion with try_to_binary Source: https://spark.apache.org/docs/latest/api/sql Converts a string to binary using a specified encoding. Returns NULL if the conversion fails or the format is invalid. ```sql SELECT try_to_binary('abc', 'utf-8'); ``` ```sql select try_to_binary('a!', 'base64'); ``` ```sql select try_to_binary('abc', 'invalidFormat'); ``` -------------------------------- ### base64 Source: https://spark.apache.org/docs/latest/api/sql Converts the argument from a binary `bin` to a base 64 string. ```APIDOC ## base64 ### Description Converts the argument from a binary `bin` to a base 64 string. ### Syntax `base64(bin)` ### Parameters * `bin` - The binary data to convert. ### Examples ```sql SELECT base64('Spark SQL'); -- Output: U3BhcmsgU1FM SELECT base64(x'537061726b2053514c'); -- Output: U3BhcmsgU1FM ``` ``` -------------------------------- ### date_sub Source: https://spark.apache.org/docs/latest/api/sql Returns the date that is a specified number of days before a start date. ```APIDOC ## date_sub ### Description Returns the date that is `num_days` before `start_date`. ### Syntax `date_sub(start_date, num_days)` ### Arguments * `start_date` - The starting date. * `num_days` - The number of days to subtract. ### Examples ```sql > SELECT date_sub('2016-07-30', 1); 2016-07-29 ``` ### Since 1.5.0 ``` -------------------------------- ### array Source: https://spark.apache.org/docs/latest/api/sql Creates an array with the provided elements. ```APIDOC ## array ### Description Returns an array with the given elements. ### Method N/A (SQL Function) ### Endpoint N/A (SQL Function) ### Parameters #### Path Parameters N/A #### Query Parameters N/A #### Request Body N/A ### Request Example ```sql SELECT array(1, 2, 3); ``` ### Response #### Success Response (200) N/A (SQL Function Result) #### Response Example ```json [1,2,3] ``` **Since:** 1.1.0 ``` -------------------------------- ### printf Source: https://spark.apache.org/docs/latest/api/sql Returns a formatted string based on a printf-style format string and provided arguments. ```APIDOC ## printf ### Description Returns a formatted string from printf-style format strings. ### Syntax printf(strfmt, obj, ...) ### Parameters * **strfmt** (string) - The format string. * **obj, ...** (any) - The arguments to format. ### Examples ```sql > SELECT printf("Hello World %d %s", 100, "days"); Hello World 100 days ``` ``` -------------------------------- ### Get the value of Pi Source: https://spark.apache.org/docs/latest/api/sql The pi() function returns the mathematical constant pi. ```sql SELECT pi(); ``` -------------------------------- ### Get Current Timestamp Source: https://spark.apache.org/docs/latest/api/sql Returns the current timestamp at the beginning of the query evaluation. ```sql > SELECT now(); 2020-04-25 15:49:11.914 ``` -------------------------------- ### Create Date from Unix Date Source: https://spark.apache.org/docs/latest/api/sql Constructs a date from the number of days since January 1, 1970. ```sql > SELECT date_from_unix_date(1); 1970-01-02 ``` -------------------------------- ### Get Map Entries Source: https://spark.apache.org/docs/latest/api/sql Extracts all key-value pairs from a map as an array of structs. ```sql SELECT map_entries(map(1, 'a', 2, 'b')); [{"key":1,"value":"a"},{"key":2,"value":"b"}] ``` -------------------------------- ### Find approximate top K items Source: https://spark.apache.org/docs/latest/api/sql Returns the top K items and their frequencies using an approximate algorithm. Parameters for K and max items tracked can be specified. ```sql SELECT approx_top_k(expr) FROM VALUES (0), (0), (1), (1), (2), (3), (4), (4) AS tab(expr); ``` ```sql SELECT approx_top_k(expr, 2) FROM VALUES 'a', 'b', 'c', 'c', 'c', 'c', 'd', 'd' AS tab(expr); ``` ```sql SELECT approx_top_k(expr, 10, 100) FROM VALUES (0), (1), (1), (2), (2), (2) AS tab(expr); ``` -------------------------------- ### Get Day of Year Source: https://spark.apache.org/docs/latest/api/sql Returns the day of the year for a given date or timestamp. ```sql SELECT dayofyear('2016-04-09'); 100 ``` -------------------------------- ### Get Current Timezone Source: https://spark.apache.org/docs/latest/api/sql Returns the current session's local timezone. ```sql > SELECT current_timezone(); Asia/Shanghai ``` -------------------------------- ### Combine Sketches with approx_top_k_combine Source: https://spark.apache.org/docs/latest/api/sql Combines multiple sketches into a single sketch. The maxItemsTracked parameter is optional and sets the limit for the combined sketch. If not specified, input sketches must have the same maxItemsTracked value. ```sql SELECT approx_top_k_estimate(approx_top_k_combine(sketch, 10000), 5) FROM (SELECT approx_top_k_accumulate(expr) AS sketch FROM VALUES (0), (0), (1), (1) AS tab(expr) UNION ALL SELECT approx_top_k_accumulate(expr) AS sketch FROM VALUES (2), (3), (4), (4) AS tab(expr)); ``` -------------------------------- ### weekofyear Source: https://spark.apache.org/docs/latest/api/sql Returns the week of the year for a given date, where weeks start on Monday. ```APIDOC ## weekofyear weekofyear(date) ### Description Returns the week of the year of the given date. A week is considered to start on a Monday and week 1 is the first week with >3 days. ### Examples ```sql > SELECT weekofyear('2008-02-20'); 8 ``` ``` -------------------------------- ### Create Map from String Source: https://spark.apache.org/docs/latest/api/sql Creates a map by splitting a string using specified or default delimiters. Available since version 2.0.1. ```sql SELECT str_to_map('a:1,b:2,c:3', ',', ':'); ``` ```sql SELECT str_to_map('a'); ``` -------------------------------- ### ntile Source: https://spark.apache.org/docs/latest/api/sql Divides the rows within each window partition into a specified number of buckets, ranging from 1 to n. ```APIDOC ## ntile ### Description Divides the rows for each window partition into `n` buckets ranging from 1 to at most `n`. ### Arguments * **buckets** - An integer expression specifying the number of buckets. Defaults to 1. ### Example ```sql SELECT a, b, ntile(2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b) ``` ``` -------------------------------- ### make_time Source: https://spark.apache.org/docs/latest/api/sql Creates a time value from hour, minute, and second components. Throws an error for invalid inputs. ```APIDOC ## make_time ### Description Create time from hour, minute and second fields. For invalid inputs it will throw an error. ### Syntax `make_time(hour, minute, second)` ### Parameters * `hour` (integer) - The hour to represent, from 0 to 23. * `minute` (integer) - The minute to represent, from 0 to 59. * `second` (double) - The second to represent, from 0 to 59.999999. ### Examples ```sql SELECT make_time(6, 30, 45.887); -- Result: 06:30:45.887 SELECT make_time(NULL, 30, 0); -- Result: NULL ``` **Since:** 4.1.0 ``` -------------------------------- ### Convert KLL Sketch to String (Float) Source: https://spark.apache.org/docs/latest/api/sql Returns a human-readable summary of a KLL sketch. Useful for debugging or inspecting sketch contents. ```sql SELECT LENGTH(kll_sketch_to_string_float(kll_sketch_agg_float(col))) > 0 FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)), (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)) tab(col); ``` -------------------------------- ### Add Days to a Date Source: https://spark.apache.org/docs/latest/api/sql Calculates the date that is a specified number of days after a start date. ```sql SELECT dateadd('2016-07-30', 1); 2016-07-31 ``` -------------------------------- ### Calculate Date Difference Source: https://spark.apache.org/docs/latest/api/sql Computes the number of days between an end date and a start date. ```sql > SELECT date_diff('2009-07-31', '2009-07-30'); 1 ``` ```sql > SELECT date_diff('2009-07-30', '2009-07-31'); -1 ``` -------------------------------- ### Convert String to Binary Source: https://spark.apache.org/docs/latest/api/sql Converts an input string to a binary value using a specified format (hex, utf-8, base64). Defaults to hex if no format is provided. Returns NULL if any input is NULL. ```sql SELECT to_binary('abc', 'utf-8'); abc ``` -------------------------------- ### Get Month Name from Date Source: https://spark.apache.org/docs/latest/api/sql Returns the three-letter abbreviated month name from a given date. ```sql > SELECT monthname('2008-02-20'); Feb ``` -------------------------------- ### Create Year-Month Interval Source: https://spark.apache.org/docs/latest/api/sql Constructs a year-month interval from specified years and months. Handles positive and negative values. ```sql SELECT make_ym_interval(1, 2); 1-2 > SELECT make_ym_interval(1, 0); 1-0 > SELECT make_ym_interval(-1, 1); -0-11 > SELECT make_ym_interval(2); 2-0 ``` -------------------------------- ### Get Count from Float KLL Sketch Source: https://spark.apache.org/docs/latest/api/sql Returns the number of items that have been added to the KLL sketch. ```sql SELECT kll_sketch_get_n_float(kll_sketch_agg_float(col)) FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)), (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)) tab(col); ``` -------------------------------- ### Get Count from Double KLL Sketch Source: https://spark.apache.org/docs/latest/api/sql Returns the count of elements aggregated into the KLL sketch. ```sql SELECT kll_sketch_get_n_double(kll_sketch_agg_double(col)) FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)), (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)) tab(col); ``` -------------------------------- ### from_protobuf Source: https://spark.apache.org/docs/latest/api/sql Converts a binary Protobuf value into a Catalyst value. Ensure the provided Protobuf schema matches the data schema for predictable results. ```APIDOC ## from_protobuf(data, messageName, descFilePath, options) ### Description Converts a binary Protobuf value into a Catalyst value. ### Parameters - **data**: The binary Protobuf data. - **messageName**: The name of the Protobuf message. - **descFilePath**: The path to the Protobuf descriptor file. - **options**: Optional configuration map. ### Request Example ```sql SELECT from_protobuf(s, 'Person', '/path/to/descriptor.desc', map()) IS NULL AS result FROM (SELECT NAMED_STRUCT('name', name, 'id', id) AS s FROM VALUES ('John Doe', 1), (NULL, 2) tab(name, id)) ``` ### Response - **result**: Boolean indicating if the conversion resulted in NULL. ``` -------------------------------- ### get Source: https://spark.apache.org/docs/latest/api/sql Retrieves an element from an array at a specified zero-based index. Returns NULL if the index is out of bounds. ```APIDOC ## get(array, index) ### Description Returns element of array at given (0-based) index. If the index points outside of the array boundaries, then this function returns NULL. ### Arguments - **array**: The input array. - **index**: The zero-based index of the element to retrieve. ### Examples ```sql > SELECT get(array(1, 2, 3), 0); 1 > SELECT get(array(1, 2, 3), 3); NULL > SELECT get(array(1, 2, 3), -1); NULL ``` ``` -------------------------------- ### Get Euler's Number Source: https://spark.apache.org/docs/latest/api/sql Returns the mathematical constant e, the base of the natural logarithm. ```sql SELECT e(); 2.718281828459045 ``` -------------------------------- ### initcap Source: https://spark.apache.org/docs/latest/api/sql Converts a string so that the first letter of each word is capitalized and the rest are lowercase. ```APIDOC ## initcap ### Description Returns `str` with the first letter of each word in uppercase. All other letters are in lowercase. Words are delimited by white space. ### Method N/A (SQL Function) ### Endpoint N/A (SQL Function) ### Parameters #### Path Parameters N/A #### Query Parameters N/A #### Request Body N/A ### Request Example ```sql SELECT initcap('sPark sql'); ``` ### Response #### Success Response (200) - **string**: The input string with words capitalized. #### Response Example ```json "Spark Sql" ``` ``` -------------------------------- ### Get Day of Week Source: https://spark.apache.org/docs/latest/api/sql Returns the day of the week for a date/timestamp, where 1 is Sunday and 7 is Saturday. ```sql SELECT dayofweek('2009-07-30'); 5 ``` -------------------------------- ### from_avro Source: https://spark.apache.org/docs/latest/api/sql Converts a binary Avro value into a Catalyst value using a provided JSON schema. ```APIDOC ## from_avro ### Description Converts a binary Avro value into a Catalyst value. ### Parameters - **child**: The binary Avro data. - **jsonFormatSchema**: The JSON schema for the Avro data. - **options**: A map of options for deserialization. ### Examples ```sql SELECT from_avro(s, '{"type": "record", "name": "struct", "fields": [{ "name": "u", "type": ["int","string"] }]}', map()) IS NULL AS result FROM (SELECT NAMED_STRUCT('u', NAMED_STRUCT('member0', member0, 'member1', member1)) AS s FROM VALUES (1, NULL), (NULL, 'a') tab(member0, member1)); -- Result: [false] ``` ### Note The specified schema must match the actual schema of the read data. Use options to specify an evolved schema. ``` -------------------------------- ### Get Abbreviated Day Name Source: https://spark.apache.org/docs/latest/api/sql Returns the three-letter abbreviated day name for a given date. ```sql SELECT dayname(DATE('2008-02-20')); Wed ``` -------------------------------- ### Return First Non-Null Argument Source: https://spark.apache.org/docs/latest/api/sql Returns the first non-null argument provided. If all arguments are null, it returns null. ```sql > SELECT coalesce(NULL, 1, NULL); 1 ``` -------------------------------- ### current_time Source: https://spark.apache.org/docs/latest/api/sql Returns the current time at the start of query evaluation, with optional precision for fractional seconds. ```APIDOC ## current_time ### Description Returns the current time at the start of query evaluation. All calls of `current_time` within the same query return the same value. ### Syntax `current_time([precision])` `current_time` ### Arguments * `precision` - An optional integer literal in the range [0..6], indicating how many fractional digits of seconds to include. If omitted, the default is 6. ### Examples ```sql > SELECT current_time(); 15:49:11.914120 > SELECT current_time; 15:49:11.914120 > SELECT current_time(0); 15:49:11 > SELECT current_time(3); 15:49:11.914 > SELECT current_time(1+1); 15:49:11.91 ``` ### Since 4.1.0 ``` -------------------------------- ### Mask NULL Input with Default Settings Source: https://spark.apache.org/docs/latest/api/sql Shows that passing NULL as input to the mask function results in a NULL output, regardless of other parameters. ```sql SELECT mask(NULL); ``` -------------------------------- ### Generate Count-Min Sketch Source: https://spark.apache.org/docs/latest/api/sql Creates a Count-Min Sketch for cardinality estimation. The result is a byte array that needs deserialization for use. ```sql SELECT hex(count_min_sketch(col, 0.5d, 0.5d, 1)) FROM VALUES (1), (2), (1) AS tab(col); ``` -------------------------------- ### Subtract Days from Date Source: https://spark.apache.org/docs/latest/api/sql Calculates a date by subtracting a specified number of days from a start date. ```sql > SELECT date_sub('2016-07-30', 1); 2016-07-29 ``` -------------------------------- ### Add Days to Date Source: https://spark.apache.org/docs/latest/api/sql Calculates a date by adding a specified number of days to a start date. ```sql > SELECT date_add('2016-07-30', 1); 2016-07-31 ``` -------------------------------- ### make_date Source: https://spark.apache.org/docs/latest/api/sql Constructs a date value from year, month, and day components. Throws an error on invalid input if ANSI SQL is enabled, otherwise returns NULL. ```APIDOC ## make_date ### Description Create date from year, month and day fields. If the configuration `spark.sql.ansi.enabled` is false, the function returns NULL on invalid inputs. Otherwise, it will throw an error instead. ### Syntax `make_date(year, month, day)` ### Parameters * `year` (integer) - The year to represent, from 1 to 9999. * `month` (integer) - The month-of-year to represent, from 1 (January) to 12 (December). * `day` (integer) - The day-of-month to represent, from 1 to 31. ### Examples ```sql SELECT make_date(2013, 7, 15); -- Result: 2013-07-15 SELECT make_date(2019, 7, NULL); -- Result: NULL ``` **Since:** 3.0.0 ``` -------------------------------- ### Get Day of Month (Alternative) Source: https://spark.apache.org/docs/latest/api/sql Returns the day of the month from a date or timestamp. This is an alias for the 'day' function. ```sql SELECT dayofmonth('2009-07-30'); 30 ``` -------------------------------- ### Format Binary Data to String Source: https://spark.apache.org/docs/latest/api/sql Converts binary data to a string representation using 'base64', 'hex', or 'utf-8' encoding. ```sql SELECT to_varchar(x'537061726b2053514c', 'base64'); ``` ```sql SELECT to_varchar(x'537061726b2053514c', 'hex'); ``` ```sql SELECT to_varchar(encode('abc', 'utf-8'), 'utf-8'); ``` -------------------------------- ### Create Geometry from WKB Source: https://spark.apache.org/docs/latest/api/sql Parses a Well-Known Binary (WKB) string and returns the corresponding GEOMETRY value. Used for deserializing geospatial data. ```sql SELECT hex(st_asbinary(st_geomfromwkb(X'0101000000000000000000F03F0000000000000040'))); ``` -------------------------------- ### Get Size of Array Source: https://spark.apache.org/docs/latest/api/sql Returns the number of elements in an array. Returns null if the input array is null. ```sql SELECT array_size(array('b', 'd', 'c', 'a')); 4 ``` -------------------------------- ### Get any value from a group Source: https://spark.apache.org/docs/latest/api/sql Returns an arbitrary value from a group of rows. Can be configured to ignore NULL values. ```sql SELECT any_value(col) FROM VALUES (10), (5), (20) AS tab(col); ``` ```sql SELECT any_value(col) FROM VALUES (NULL), (5), (20) AS tab(col); ``` ```sql SELECT any_value(col, true) FROM VALUES (NULL), (5), (20) AS tab(col); ``` -------------------------------- ### Generate Session Windows for Streaming Data Source: https://spark.apache.org/docs/latest/api/sql Creates session windows based on a timestamp column and a gap duration. Handles both fixed and dynamic gap durations. ```sql SELECT a, session_window.start, session_window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:10:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, session_window(b, '5 minutes') ORDER BY a, start; A1 2021-01-01 00:00:00 2021-01-01 00:09:30 2 A1 2021-01-01 00:10:00 2021-01-01 00:15:00 1 A2 2021-01-01 00:01:00 2021-01-01 00:06:00 1 ``` ```sql SELECT a, session_window.start, session_window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:10:00'), ('A2', '2021-01-01 00:01:00'), ('A2', '2021-01-01 00:04:30') AS tab(a, b) GROUP by a, session_window(b, CASE WHEN a = 'A1' THEN '5 minutes' WHEN a = 'A2' THEN '1 minute' ELSE '10 minutes' END) ORDER BY a, start; A1 2021-01-01 00:00:00 2021-01-01 00:09:30 2 A1 2021-01-01 00:10:00 2021-01-01 00:15:00 1 A2 2021-01-01 00:01:00 2021-01-01 00:02:00 1 A2 2021-01-01 00:04:30 2021-01-01 00:05:30 1 ``` -------------------------------- ### next_day Source: https://spark.apache.org/docs/latest/api/sql Finds the first date that is later than a given start date and falls on a specified day of the week. ```APIDOC ## next_day ### Description Returns the first date which is later than `start_date` and named as indicated. The function returns NULL if at least one of the input parameters is NULL. When both of the input parameters are not NULL and `day_of_week` is an invalid input, the function throws SparkIllegalArgumentException if `spark.sql.ansi.enabled` is set to true, otherwise NULL. ### Method next_day(start_date, day_of_week) ### Parameters - **start_date** (date) - The date from which to find the next day of the week. - **day_of_week** (string) - The target day of the week (e.g., 'TU' for Tuesday). ### Examples ``` > SELECT next_day('2015-01-14', 'TU'); 2015-01-20 ``` ``` -------------------------------- ### Create Map from Key-Value Pairs Source: https://spark.apache.org/docs/latest/api/sql Generates a map data structure from a list of key-value pairs. ```sql SELECT map(1.0, '2', 3.0, '4'); {1.0:"2",3.0:"4"} ``` -------------------------------- ### current_timestamp Source: https://spark.apache.org/docs/latest/api/sql Returns the current timestamp at the start of query evaluation. All calls within the same query return the same value. ```APIDOC ## current_timestamp ### Description Returns the current timestamp at the start of query evaluation. All calls of `current_timestamp` within the same query return the same value. ### Syntax `current_timestamp()` `current_timestamp` ### Examples ```sql > SELECT current_timestamp(); 2020-04-25 15:49:11.914 > SELECT current_timestamp; 2020-04-25 15:49:11.914 ``` ### Notes The syntax without braces has been supported since 2.0.1. ### Since 1.5.0 ``` -------------------------------- ### current_date Source: https://spark.apache.org/docs/latest/api/sql Returns the current date at the start of query evaluation. All calls within the same query return the same value. ```APIDOC ## current_date ### Description Returns the current date at the start of query evaluation. All calls of `current_date` within the same query return the same value. ### Syntax `current_date()` `current_date` ### Examples ```sql > SELECT current_date(); 2020-04-25 > SELECT current_date; 2020-04-25 ``` ### Notes The syntax without braces has been supported since 2.0.1. ### Since 1.5.0 ``` -------------------------------- ### Repeat Element to Create Array Source: https://spark.apache.org/docs/latest/api/sql Creates an array by repeating a given element a specified number of times. ```sql SELECT array_repeat('123', 2); ["123","123"] ``` -------------------------------- ### format_string Source: https://spark.apache.org/docs/latest/api/sql Returns a formatted string using printf-style format specifiers. ```APIDOC ## format_string ### Description Returns a formatted string from printf-style format strings. ### Parameters - **strfmt**: The format string. - **obj, ...**: The objects to format. ### Examples ```sql SELECT format_string("Hello World %d %s", 100, "days"); -- Result: Hello World 100 days ``` ``` -------------------------------- ### Prepend Element to Array Source: https://spark.apache.org/docs/latest/api/sql Adds an element to the beginning of an array. The element type must match the array's element type. Handles null elements and null arrays. ```sql SELECT array_prepend(array('b', 'd', 'c', 'a'), 'd'); ["d","b","d","c","a"] ``` ```sql SELECT array_prepend(array(1, 2, 3, null), null); [null,1,2,3,null] ``` ```sql SELECT array_prepend(CAST(null as Array), 2); NULL ```