Athena TimeStamp Reading - isgaur/AWS-BigData-Solutions GitHub Wiki

Both Athena and Spark SQL use UNIX timestamps in Secs precision. Following was the query I executed to test this:

Athena Query:

sec Precision:

WITH dataset AS (
SELECT ARRAY [-2177434800] AS words

) SELECT

from_unixtime((element_at(words, 1))) AS first_word

FROM dataset

OR

msec Precision:

WITH dataset AS (
SELECT ARRAY [-2177434800000] AS words

) SELECT

from_unixtime((element_at(words, 1)/1000)) AS first_word

FROM dataset

Both of the queries mentioned above yielded the same result:

Athena Query result:

first_word 1901-01-01 05:00:00.000 ============

Tried using the same unixtimestamp value in Spark SQL and was able to achieve the same result:

SparkSQL Sample Source code:

>>> import pyspark.sql.functions as F
>>> from pyspark.sql.functions import udf
>>> from datetime import datetime
>>> df = spark.createDataFrame([('-2177434800000', )], ['TIME'])
>>> get_timestamp = udf(lambda x : datetime.utcfromtimestamp(float(x)/ 1000.0).strftime("%Y-%m-%d %H:%M:%S"))
>>> df_withdate = df.withColumn("datetime", get_timestamp(df.TIME))
>>> df_withdate.show(20,False)
+--------------+-------------------+
|          TIME|           datetime|
+--------------+-------------------+
|-2177434800000|1901-01-01 05:00:00|
+--------------+-------------------+
============

In your case description, you mention "Objective is to read timestamp type without casting or function in glue and athena." this is something not possible to achieve directly unless you transform the data from UNIX timestamps to one of the datetime format (ex: ISO8601 - [1] or use format strings like "%Y-%m-%d %H:%M:%S" - similar to the UNIX format used in Athena[2][3])

Please note that if INT64 unix format with second precision is used, you do not need to divide the value by 1000. If you use millisecond precision, then the value needs to be divided by 1000 to achieve expected results.

⚠️ **GitHub.com Fallback** ⚠️