Handling StructType columns in notebooks
First, let us make some example data again.
[1]:
from pyspark.sql import SparkSession
spark = SparkSession.Builder().config("spark.ui.showConsoleProgress", "false").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
[2]:
from typedspark import Schema, Column, StructType, create_partially_filled_dataset, load_table
from pyspark.sql.types import IntegerType
class Values(Schema):
a: Column[IntegerType]
b: Column[IntegerType]
class Container(Schema):
values: Column[StructType[Values]]
create_partially_filled_dataset(
spark,
Container,
{
Container.values: create_partially_filled_dataset(
spark,
Values,
{Values.a: [1, 2, 3]},
).collect(),
},
).createOrReplaceTempView("structtype_table")
container, ContainerSchema = load_table(spark, "structtype_table", "Container")
Like before, we can show the schema simply by running:
[3]:
ContainerSchema
[3]:
from pyspark.sql.types import IntegerType
from typedspark import Column, Schema, StructType
class Container(Schema):
values: Column[StructType[Values]]
We can show the StructType schema using:
[4]:
ContainerSchema.values.dtype.schema
[4]:
from pyspark.sql.types import IntegerType
from typedspark import Column, Schema
class Values(Schema):
a: Column[IntegerType]
b: Column[IntegerType]
We can also use this in queries, for example:
[5]:
container.filter(ContainerSchema.values.dtype.schema.a > 1).show()
+---------+
| values|
+---------+
|{2, NULL}|
|{3, NULL}|
+---------+