hdp 3.1.0 how to control number of files in hive

2020-02-14 file hive hortonworks-data-platform orc

I'm working on hortonworks 3.1.0 and hive - TEZ or LLAP, ORC format A first traitment create 1008 files per partitions... each arounk 10KB The option COMPACT or CONCATENATE keeps the same number of files. (Cf alter table xxx compact 'major') Rewriting on creating another table is better but I cannot fix the number of mappers (with TEZ or LLAP). Because I have not really a control on mappers, I can reduce the number of files by Reducers. In fact; add an order by clause so the number of files is the number of reducers ;-)

There are 2 statics parameters: tez.grouping.min-size and tez.grouping.max-size but you need to restart hive and some services each time you adjust them.

Is there a way to make a compaction and specify the number of files you want or something around some gigas ?

cheers Jean-Luc

Answers

Related