Autocomplete in Databricks & Jupyter notebooks
When we use Catalogs, Databases, Database, load_table() or create_schema() in a Databricks or Jupyter notebook, we also get autocomplete on the column names. No more looking at df.columns every minute to remember the column names!
The basics
To illustrate this, let us first generate a table that we’ll write to the table person_table.
[1]:
from pyspark.sql import SparkSession
spark = SparkSession.Builder().config("spark.ui.showConsoleProgress", "false").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
[2]:
import pandas as pd
(
spark.createDataFrame(
pd.DataFrame(
dict(
name=["Jack", "John", "Jane"],
age=[20, 30, 40],
)
)
).createOrReplaceTempView("person_table")
)
We can now load these data using load_table(). Note that the Schema is inferred: it doesn’t need to have been serialized using typedspark.
[3]:
from typedspark import load_table
df, Person = load_table(spark, "person_table")
You can now use df and Person just like you would in your IDE, including autocomplete!
[4]:
df.filter(Person.age > 25).show()
+----+---+
|name|age|
+----+---+
|John| 30|
|Jane| 40|
+----+---+
Other notebook types
Auto-complete of dynamically loaded schemas (e.g. through load_table() or create_schema()) has been verified to work on Databricks, JupyterLab and Jupyter Notebook. At the time of writing, it doesn’t work in VSCode and PyCharm notebooks.