Autocomplete in Databricks & Jupyter notebooks

When we use Catalogs, Databases, Database, load_table() or create_schema() in a Databricks or Jupyter notebook, we also get autocomplete on the column names. No more looking at df.columns every minute to remember the column names!

The basics

To illustrate this, let us first generate a table that we’ll write to the table person_table.

[1]:

from pyspark.sql import SparkSession

spark = SparkSession.Builder().config("spark.ui.showConsoleProgress", "false").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")

[2]:

import pandas as pd

(
    spark.createDataFrame(
        pd.DataFrame(
            dict(
                name=["Jack", "John", "Jane"],
                age=[20, 30, 40],
            )
        )
    ).createOrReplaceTempView("person_table")
)

We can now load these data using load_table(). Note that the Schema is inferred: it doesn’t need to have been serialized using typedspark.

[3]:

from typedspark import load_table

df, Person = load_table(spark, "person_table")

You can now use df and Person just like you would in your IDE, including autocomplete!

[4]:

df.filter(Person.age > 25).show()

+----+---+
|name|age|
+----+---+
|John| 30|
|Jane| 40|
+----+---+

Other notebook types

Auto-complete of dynamically loaded schemas (e.g. through load_table() or create_schema()) has been verified to work on Databricks, JupyterLab and Jupyter Notebook. At the time of writing, it doesn’t work in VSCode and PyCharm notebooks.