Athena TimeStamp Reading - isgaur/AWS-BigData-Solutions GitHub Wiki
Both Athena and Spark SQL use UNIX timestamps in Secs precision. Following was the query I executed to test this:
- WITH dataset AS (
- SELECT ARRAY [-2177434800] AS words
) SELECT
from_unixtime((element_at(words, 1))) AS first_word
FROM dataset
OR
- WITH dataset AS (
- SELECT ARRAY [-2177434800000] AS words
) SELECT
from_unixtime((element_at(words, 1)/1000)) AS first_word
Both of the queries mentioned above yielded the same result:
first_word 1901-01-01 05:00:00.000 ============
Tried using the same unixtimestamp value in Spark SQL and was able to achieve the same result:
>>> import pyspark.sql.functions as F
>>> from pyspark.sql.functions import udf
>>> from datetime import datetime
>>> df = spark.createDataFrame([('-2177434800000', )], ['TIME'])
>>> get_timestamp = udf(lambda x : datetime.utcfromtimestamp(float(x)/ 1000.0).strftime("%Y-%m-%d %H:%M:%S"))
>>> df_withdate = df.withColumn("datetime", get_timestamp(df.TIME))
>>> df_withdate.show(20,False)
+--------------+-------------------+
| TIME| datetime|
+--------------+-------------------+
|-2177434800000|1901-01-01 05:00:00|
+--------------+-------------------+
============
In your case description, you mention "Objective is to read timestamp type without casting or function in glue and athena." this is something not possible to achieve directly unless you transform the data from UNIX timestamps to one of the datetime format (ex: ISO8601 - [1] or use format strings like "%Y-%m-%d %H:%M:%S" - similar to the UNIX format used in Athena[2][3])
Please note that if INT64 unix format with second precision is used, you do not need to divide the value by 1000. If you use millisecond precision, then the value needs to be divided by 1000 to achieve expected results.