Other complex datatypes
Spark contains several other complex data types.
MapType, ArrayType, DecimalType and DayTimeIntervalType
These can be used in typedspark as follows:
[1]:
from typing import Literal
from pyspark.sql.types import StringType
from typedspark import (
ArrayType,
DayTimeIntervalType,
DecimalType,
IntervalType,
MapType,
Schema,
Column,
)
class Values(Schema):
array: Column[ArrayType[StringType]]
map: Column[MapType[StringType, StringType]]
decimal: Column[DecimalType[Literal[38], Literal[18]]]
interval: Column[DayTimeIntervalType[IntervalType.HOUR, IntervalType.SECOND]]
Generating DataSets
You can generate DataSets using complex data types in the following way:
[2]:
from pyspark.sql import SparkSession
spark = SparkSession.Builder().config("spark.ui.showConsoleProgress", "false").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
[3]:
from datetime import date, datetime, timedelta
from decimal import Decimal
from pyspark.sql.types import DateType, TimestampType
from typedspark._utils.create_dataset import create_partially_filled_dataset
class MoreValues(Values):
date: Column[DateType]
timestamp: Column[TimestampType]
create_partially_filled_dataset(
spark,
MoreValues,
{
MoreValues.array: [["a", "b", "c"]],
MoreValues.map: [{"a": "b"}],
MoreValues.decimal: [Decimal(32)],
MoreValues.interval: [timedelta(days=1, hours=2, minutes=3, seconds=4)],
MoreValues.date: [date(2020, 1, 1)],
MoreValues.timestamp: [datetime(2020, 1, 1, 10, 15)],
},
).show()
+---------+--------+--------------------+--------------------+----------+-------------------+
| array| map| decimal| interval| date| timestamp|
+---------+--------+--------------------+--------------------+----------+-------------------+
|[a, b, c]|{a -> b}|32.00000000000000...|INTERVAL '26:03:0...|2020-01-01|2020-01-01 10:15:00|
+---------+--------+--------------------+--------------------+----------+-------------------+
Did we miss a data type?
Feel free to make an issue! We can extend the list of supported data types.