Handling StructType columns in notebooks

First, let us make some example data again.

[1]:

from pyspark.sql import SparkSession

spark = SparkSession.Builder().config("spark.ui.showConsoleProgress", "false").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")

[2]:

from typedspark import Schema, Column, StructType, create_partially_filled_dataset, load_table
from pyspark.sql.types import IntegerType


class Values(Schema):
    a: Column[IntegerType]
    b: Column[IntegerType]


class Container(Schema):
    values: Column[StructType[Values]]


create_partially_filled_dataset(
    spark,
    Container,
    {
        Container.values: create_partially_filled_dataset(
            spark,
            Values,
            {Values.a: [1, 2, 3]},
        ).collect(),
    },
).createOrReplaceTempView("structtype_table")

container, ContainerSchema = load_table(spark, "structtype_table", "Container")

Like before, we can show the schema simply by running:

[3]:

ContainerSchema

[3]:


from pyspark.sql.types import IntegerType

from typedspark import Column, Schema, StructType


class Container(Schema):
    values: Column[StructType[Values]]

We can show the StructType schema using:

[4]:

ContainerSchema.values.dtype.schema

[4]:


from pyspark.sql.types import IntegerType

from typedspark import Column, Schema


class Values(Schema):
    a: Column[IntegerType]
    b: Column[IntegerType]

We can also use this in queries, for example:

[5]:

container.filter(ContainerSchema.values.dtype.schema.a > 1).show()

+---------+
|   values|
+---------+
|{2, NULL}|
|{3, NULL}|
+---------+