Hive is mostly used Structured Data Processing.
Below are some general function used in Day to day work.
LENGTH( string str )
The LENGTH function returns the number of characters in a string
Example: LENGTH('Bharat') returns 5
Query: Select length(name) from table1;
LTRIM( string str )
The LTRIM function removes all the trailing spaces from the string.
LTRIM(' Bharat') returns 'Bharat'
RTRIM( string str )
The RTRIM function removes all the ending spaces from the string.
LTRIM('Bharat ') returns 'Bharat'
TRIM( string str )
The TRIM function removes all the starting/ending spaces from the string.
LTRIM(' Bharat ') returns 'Bharat'
SPLIT( string str, string pat )
The SPLIT function splits the string around the pattern pat and returns an array of strings. You can specify regular expressions as patterns.
SPLIT('Bharat:Manoj:Amol',':') returns ["Bharat","Manoj",""Amol]
SUBSTR( string source_str, int start_position [,int length] )
The SUBSTR function returns a part of the source string from the start position with the specified length of characters. If the length is not given, then it returns from the start position to the end of the string.
SUBSTR('Bharatkumar',7) returns 'kumar'
SUBSTR('Bharatkumar',6,5) returns 'tkuma'
Note : String start with position 1, First Example string start with 7 postion to end. second example substring of postion 6 to 5 character.
regexp_extract(string, pattern)
Returns the first substring matched by the regular expression pattern in string:
SELECT regexp_extract('1a 2b 14m', '\d+'); -- 1
regexp_replace(string, pattern)
Removes every instance of the substring matched by the regular expression pattern from string:
SELECT regexp_replace('1a 2b 14m', '\d+[ab] '); -- '14m'
ROUND( double value [, int n] )
The ROUND function returns the value rounded to n integer places.
Example: ROUND(123.456,2) Retrun 123.46
UNIX_TIMESTAMP( string date, string pattern )
This function converts the date to the specified date format and returns the number of seconds between the specified date and Unix epoch. If it fails, then it returns 0.
UNIX_TIMESTAMP('2000-01-01 10:20:30','yyyy-MM-dd') returns 946713600
FROM_UNIXTIME( bigint number_of_seconds [, string format] )
The FROM_UNIX function converts the specified number of seconds from Unix epoch and returns the date in the format 'yyyy-MM-dd HH:mm:ss'.
FROM_UNIXTIME( UNIX_TIMESTAMP() ) returns the current date including the time. This is equivalent to the SYSDATE in oracle.
TO_DATE( string timestamp )
The TO_DATE function returns the date part of the timestamp in the format 'yyyy-MM-dd'.
Example: TO_DATE('2000-01-01 10:20:30') returns '2000-01-01'
DATEDIFF( string date1, string date2 )
The DATEDIFF function returns the number of days between the two given dates.
Example: DATEDIFF('2000-03-01', '2000-01-10') returns 51
DATE_ADD( string date, int days )
The DATE_ADD function adds the number of days to the specified date
Example: DATE_ADD('2000-03-01', 5) returns '2000-03-06'
DATE_SUB( string date, int days )
The DATE_SUB function subtracts the number of days to the specified date
Example: DATE_SUB('2000-03-01', 5) returns ‘2000-02-25’