Dataframe Schema - ignacio-alorre/Spark GitHub Wiki

The schema information, and the optimizations it enables, is on of the core differences between Spark SQL and core Spark. Inspecting the schema is especially useful for DataFrames, since you don't have the templated type you do with RDDs or Datasets.
Schemas are normally handled automatically by Spark SQL, either inferred when loading the data or computed based on the parent DataFrames and the transformation being applied
DataFrames expose the schema in both human-readable or programmatic formats. printSchema() will show us the schema of a DataFRame in the consoles. For programmatic usage, you can get the schema by simply calling schema

{"name":"mission","pandas":[{"id":1,"zip":"94110","pt":"giant", "happy":true,
"attributes":[0.4,0.5]}]}

This schema would look like:

df.printSchema()

root
|-- name: string (nullable = true)
|-- pandas: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: long (nullable = false)
| | |-- zip: string (nullable = true)
| | |-- pt: string (nullable = true)
| | |-- happy: boolean (nullable = false)
| | |-- attributes: array (nullable = true)
| | | |-- element: double (containsNull = false)

df.schema()

org.apache.spark.sql.types.StructType = StructType(
StructField(name,StringType,true),
StructField(pandas,
ArrayType(
StructType(StructField(id,LongType,false),
StructField(zip,StringType,true),
StructField(pt,StringType,true),
StructField(happy,BooleanType,false),
StructField(attributes,ArrayType(DoubleType,false),true)),
true),true))