We should think about filling in the gaps in the native Spark datetime libraries by adding functions to spark-daria. The Spark date functions aren’t comprehensive and Java / Scala datetime libraries are notoriously difficult to work with. Look at the Spark SQL functions for the full list of methods available for working with dates and times in Spark. The date_add() function can be used to add days to a date. val sourceDF = spark.createDF(ĭatediff(current_timestamp(), col("birth_date")) Let’s use these functions to calculate someone’s age in days. Examples > df spark.createDataFrame( ('',), 'dt') > df.select(dateadd(df.dt, 1).alias('nextdate')).collect() Row (nextdatedatetime. The datediff() and current_date() functions can be used to calculate the number of days between today and a date in a DateType column. Returns the date that is days days after start New in version 1.5.0. |person_id| fun_time|fun_minute|fun_second| Let’s create a DataFrame with a TimestampType column and use built in Spark functions to extract the minute and second from the timestamp. |person_id|birth_date|birth_year|birth_month|birth_day| Let’s create a DataFrame with a DateType column and use built in Spark functions to extract the year, month, and day from the date. |- birth_date: date (nullable = true) year(), month(), dayofmonth() The cast() method can create a DateType column by converting a StringType column into a date. This command reads parquet files, which is the default file format for Spark, but you can also add the parameter format to read. Import the library to create a DataFrame with a DateType column. Here, year, day, time and datetime are strings. Spark supports ArrayType, MapType and StructType columns in addition to the DateType / TimestampType columns covered in this post.Ĭheck out Writing Beautiful Spark Code for a detailed overview of the different complex column types and how they should be used when architecting Spark applications. The program below converts a datetime object containing current date and time to different string formats. This blog post will demonstrates how to make DataFrames with DateType / TimestampType columns and how to leverage Spark’s functions for working with these columns. Spark supports DateType and TimestampType columns and defines a rich API of functions to make working with dates and times easy.
0 Comments
Leave a Reply. |