{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "c3322756", "metadata": {}, "source": [ "# Autocomplete in Databricks & Jupyter notebooks\n", "When we use `Catalogs`, `Databases`, `Database`, `load_table()` or `create_schema()` in a Databricks or Jupyter notebook, we also get autocomplete on the column names. No more looking at `df.columns` every minute to remember the column names!\n", "\n", "## The basics\n", "\n", "To illustrate this, let us first generate a table that we'll write to the table `person_table`." ] }, { "cell_type": "code", "execution_count": 1, "id": "87752202", "metadata": {}, "outputs": [], "source": [ "from pyspark.sql import SparkSession\n", "\n", "spark = SparkSession.Builder().config(\"spark.ui.showConsoleProgress\", \"false\").getOrCreate()\n", "spark.sparkContext.setLogLevel(\"ERROR\")" ] }, { "cell_type": "code", "execution_count": 2, "id": "6c1e5acc", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "(\n", " spark.createDataFrame(\n", " pd.DataFrame(\n", " dict(\n", " name=[\"Jack\", \"John\", \"Jane\"],\n", " age=[20, 30, 40],\n", " )\n", " )\n", " ).createOrReplaceTempView(\"person_table\")\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "4bd96763", "metadata": {}, "source": [ "We can now load these data using `load_table()`. Note that the `Schema` is inferred: it doesn't need to have been serialized using `typedspark`." ] }, { "cell_type": "code", "execution_count": 3, "id": "3003dea9", "metadata": {}, "outputs": [], "source": [ "from typedspark import load_table\n", "\n", "df, Person = load_table(spark, \"person_table\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "65c183c1", "metadata": {}, "source": [ "You can now use `df` and `Person` just like you would in your IDE, including autocomplete!" ] }, { "cell_type": "code", "execution_count": 4, "id": "f38e0e20", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "+----+---+\n", "|name|age|\n", "+----+---+\n", "|John| 30|\n", "|Jane| 40|\n", "+----+---+\n", "\n" ] } ], "source": [ "df.filter(Person.age > 25).show()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "eb3f8c9c", "metadata": {}, "source": [ "## Other notebook types\n", "\n", "Auto-complete of dynamically loaded schemas (e.g. through `load_table()` or `create_schema()`) has been verified to work on Databricks, JupyterLab and Jupyter Notebook. At the time of writing, it doesn't work in VSCode and PyCharm notebooks." ] }, { "attachments": {}, "cell_type": "markdown", "id": "c342aec1", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" } }, "nbformat": 4, "nbformat_minor": 5 }