blog

首页

hive 的基础知识

问题

# Hive transaction manager must be set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager in order to work with ACID tables.
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

#Additionally, Set these properties to turn on transaction support
# Client Side
SET hive.support.concurrency=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nonstrict;

# Server Side (Metastore)
SET hive.compactor.initiator.on=true;
SET hive.compactor.worker.threads=1;

# Note: Add these properties in hive-site.xml to set them permanently.
set hive.support.quoted.identifiers=None;
select `(name|id|pwd)?+.+` from tableName;
set hive.execution.engine=spark;
set spark.executor.memory=8g;

hive 中行转列

原始数据

id goods
80 [{"floorImgUrl":"https://hotwheel-static.jianlc.com/shop/20201230/1609299782739.png","floorExhibitionMode":2,"goods":[{"goodId":"64686117450","channel":1,"jxCashGiftId":"5160ea3579bc6f43","cashGiftAmount":5,"cashGiftStartTime":"2020.12.30 00:00","cashGiftEndTime":"2021.01.05 23:59"},{"goodId":"30808459309","channel":1,"jxCashGiftId":"c9672e705444178a","cashGiftAmount":5,"cashGiftStartTime":"2020.12.30 00:00","cashGiftEndTime":"2021.01.05 23:59"}]}]

SQL 处理过程

select
id,
regexp_extract(json1,'"goodId":"([\\d_-]*)"', 1) as goodId,
regexp_extract(json1,'"channel":([\\d_-]*)', 1) as channel,
regexp_extract(json1,'"jxCashGiftId":"([a-zA-Z0-9]*)"', 1) as jxCashGiftId,
regexp_extract(json1,'"cashGiftAmount":([\\d_-]*)', 1) as cashGiftAmount,
regexp_extract(json1,'"cashGiftStartTime":"([\\d\.\\s:]*)"', 1) as cashGiftStartTime,
regexp_extract(json1,'"cashGiftEndTime":"([\\d\.\\s:]*)"', 1) as cashGiftEndTime
from
(select id,json1 from activity_content
lateral view  explode(split(regexp_replace(get_json_object(goods, '$[*].goods[*]'),'\\[\\{|\\\\}]',''),'\\}\\,\\{')) js as json1
where id = 80
) tmp

处理后

id goodid channel jxCashGiftId cashGiftAmount cashGiftStartTime cashGiftEndTime
80 64686117450 1 5160ea3579bc6f43 5 2020.12.30 00:00 2021.01.05 23:59
80 30808459309 1 c9672e705444178a 5 2020.12.30 00:00 2021.01.05 23:59

hive 中大数据量优化历程

千亿级别的查询优化

漫谈千亿级数据优化实践:一次数据优化实录

数据倾斜问题