hdp 3.1.0 how to control number of files in hive2020-02-14 file hive hortonworks-data-platform orc
I'm working on hortonworks 3.1.0 and hive - TEZ or LLAP, ORC format A first traitment create 1008 files per partitions... each arounk 10KB The option COMPACT or CONCATENATE keeps the same number of files. (Cf alter table xxx compact 'major') Rewriting on creating another table is better but I cannot fix the number of mappers (with TEZ or LLAP). Because I have not really a control on mappers, I can reduce the number of files by Reducers. In fact; add an order by clause so the number of files is the number of reducers ;-)
There are 2 statics parameters: tez.grouping.min-size and tez.grouping.max-size but you need to restart hive and some services each time you adjust them.
Is there a way to make a compaction and specify the number of files you want or something around some gigas ?
- How do I check whether a file exists without exceptions?
- How do I copy a file in Python?
- How do I create a Java string from the contents of a file?
- How do you append to a file in Python?
- How to read a file line-by-line into a list?
- How does the CONCATENATE in ALTER TABLE command in HIVE works
- How to combine multiple ORC files (belonging to each partition) in a Partitioned Hive ORC table into a single big ORC file
- Apache pig - Best Hive file formats
- hive left outer join long running