Regex in spark.read.json

2020-02-14 regex apache-spark hadoop

I want to read all json files which are having timestamp one hour before the current time from the hadoop directory. File name is like test_2020021418553333

import java.util.Calendar;

import java.text.SimpleDateFormat;

val form = new SimpleDateFormat("yyyyMMddhh");

val c = Calendar.getInstance();

c.add(Calendar.HOUR, -1);

val path ="/Test_"+form.format(c.getTime())+"*";

val test_df = spark.read.json(path)

When I run this code: Path does not exist error is coming. Can anyone suggest how to read file names like Test_20200214{Any Possible combination of Digit}??

Answers

A quick test show that you have minutes

form.format(c.getTime()) res2: String = 2020021401 So remove the latest 2 cars

regards

Related