Autocomplete in Databricks & Jupyter notebooks

When we use Catalogs, Databases, Database, load_table() or create_schema() in a Databricks or Jupyter notebook, we also get autocomplete on the column names. No more looking at df.columns every minute to remember the column names!

The basics

To illustrate this, let us first generate a table that we’ll write to the table person_table.

[1]:
from pyspark.sql import SparkSession

spark = SparkSession.Builder().config("spark.ui.showConsoleProgress", "false").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
[2]:
import pandas as pd

(
    spark.createDataFrame(
        pd.DataFrame(
            dict(
                name=["Jack", "John", "Jane"],
                age=[20, 30, 40],
            )
        )
    ).createOrReplaceTempView("person_table")
)

We can now load these data using load_table(). Note that the Schema is inferred: it doesn’t need to have been serialized using typedspark.

[3]:
from typedspark import load_table

df, Person = load_table(spark, "person_table")

You can now use df and Person just like you would in your IDE, including autocomplete!

[4]:
df.filter(Person.age > 25).show()
+----+---+
|name|age|
+----+---+
|John| 30|
|Jane| 40|
+----+---+

Other notebook types

Auto-complete of dynamically loaded schemas (e.g. through load_table() or create_schema()) has been verified to work on Databricks, JupyterLab and Jupyter Notebook. At the time of writing, it doesn’t work in VSCode and PyCharm notebooks.