=============================================================== Typedspark: column-wise type annotations for pyspark DataFrames =============================================================== We love Spark! But in production code we're wary when we see: .. code-block:: python from pyspark.sql import DataFrame def foo(df: DataFrame) -> DataFrame: # do stuff return df Because… How do we know which columns are supposed to be in ``df``? Using ``typedspark``, we can be more explicit about what these data should look like. .. code-block:: python from typedspark import Column, DataSet, Schema from pyspark.sql.types import LongType, StringType class Person(Schema): id: Column[LongType] name: Column[StringType] age: Column[LongType] def foo(df: DataSet[Person]) -> DataSet[Person]: # do stuff return df The advantages include: * Improved readability of the code * Typechecking, both during runtime and linting * Auto-complete of column names * Easy refactoring of column names * Easier unit testing through the generation of empty ``DataSets`` based on their schemas * Improved documentation of tables Installation ============ You can install ``typedspark`` from `pypi `_ by running: .. code-block:: bash pip install typedspark By default, ``typedspark`` does not list ``pyspark`` as a dependency, since many platforms (e.g. Databricks) come with ``pyspark`` preinstalled. If you want to install ``typedspark`` with ``pyspark``, you can run: .. code-block:: bash pip install "typedspark[pyspark]" Compatibility ============= Typedspark is tested in CI with PySpark 3.5.7 and 4.1.0. Spark Connect is supported when using PySpark 4.x, and the Connect-specific test runs if ``SPARK_CONNECT_URL`` is set. Demo videos =========== * IDE demo: `video `_ and `code `_. * Jupyter / Databricks Notebook demo: `video `_ and `code `_. FAQ === | **I found a bug! What should I do?** | Great! Please make an issue and we'll look into it. | | **I have a great idea to improve typedspark! How can we make this work?** | Awesome, please make an issue and let us know!