Coming back down to Earth

by marco.p.v.vezzoli · February 9, 2025

Our journey in the galaxy of Python analytics started with a visit on Jupyter where we met the Pandas and started our fearless exploration of data.

It is sometime useful to leave the colorful interactive environment of Jupyter to create more prosaic standalone python script which are easier to automate or included in a larger project.

It’s time to come back to Earth: this won’t be a step back, rather it can be an opportunity to start something bigger, like a larger team project or to create an application.

Create a Script from a Jupyter Notebook

Sometime it is useful to transform your notebook in an actual script e.g.:

if you want it to be executed automatically (unattended)
if you want to create a module out of it in order to use it in a bigger application

extract code from Jupyter

Jupyter has a very wide range of formats to export the content of a notebook; some of them are a graphical export (e.g. html or pdf – this requires a latex installation) or textual export (ascii, rtf), etc.

To start creating a python script from your notebook, open it into Jupyter, then

from the File menu select save and export notebook as
from the submenu select Executable script this will save a file in your Download directory named with the title of your notebook, but with .py extension

You can now open this file with your favorite code editor; here is what you will see:

before every cell there will be a comment with the cell execution number
every code cell will be copied in an individual block of commands
markdown cells and raw cells will be presented as comments
output will be removed

In the following sections I will list some cleanup actions which will help you transform this script into a more manageable piece of code

manage magic code

magic code is translated into the equivalent python call e.g.

?sum

becomes

get_ipython().run_line_magic('pinfo', 'sum')

most of the time you may want to get rid of all of this kind of code as some functionalities (e.g. accessing documentation) are intended only for interactive usage within a jupyter notebook.

Other functionalities (e.g. timing your cell execution) may be better managed with other libraries.

add code to save tables

Jupyter conveniently shows pandas =DataFrame=s as tables; you may want to extract these results into files.

To access these tables from a script you can save them into files

small size

If your table are small you may be willing to save them in some simple format:

csv using
```
  df.to_csv("my_file.csv")
```
Microsoft Excel using openpyxl optional library
```
  df.to_excel("my_file.xlsx")
```

large size

For larger tables or more complex tasks binary formats may help.

Apache Parquet is a columnar binary format ideal for large data collections and high performance computation; it requires some optional library e.g. pyarrow see documentation
```
  df.to_parquet("my_file.pqt")
```
hdf5 is a binary format which can contain multiple tables in a single file (see documentation ) python df.to_hdf("my_file.pqt")

databases

Pandas has a simple mapping from tables to data frames which requires sqlalchemy and a driver of the database. Python basic distribution include sqlite but many more are available as optional packages python df.to_sql(name’my_table’, con=engine)=

add code to save figures

the following example works for matplotlib and any library based on it, like seaborn.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
california = pd.read_csv("california_pb_2023.csv")

The simplest way to save an image is to

store the Axis object created into some variable
reach the Figure object from the .figure attribute
use the .savefig() method

ax = sns.histplot(california,x="Daily Mean Pb Concentration")
ax.figure.savefig("pb_2003.png")

A more complex sequence is required when working with multiple plots either stacked or overlapped.

In this case the pd.subplot() function creates multiple charts (axis) in a single figure

# this code is general: two charts in a single row
fig, (ax1, ax2) = plt.subplots(1, 2)
fig.suptitle('Horizontally stacked subplots')

# this code is specific for this function
from scipy import stats
stats.probplot(california["Daily Mean Pb Concentration"], plot=ax1)
stats.probplot(california["Daily Mean Pb Concentration"], plot=ax2,dist=stats.distributions.lognorm(s=1))

fig.savefig("probplots.png")

clean up the code

The following suggestions holds for any python script and are not strictly required for the execution.

move all import statements at the beginning of the file
organize the code in functions and classes; possibly add type annotations
create a single entry point at the bottom of the code with the usual python if __name__ = “main”: main()=
add command line options management using libraries like optparse (see here)
separate data and configuration from code: libraries like toml (see here) can help reading configuration files
transform absolute paths into relative paths
consider using pyproject.html to collect dependencies and constraints
consider using a linter (e.g. pylint or ruff) to evaluate code inconsistencies
create unit tests to verify your functions individually; pytest helps in this task
add documentation per each function or class as well as a module doc string
use a code formatter to keep your style consistent (e.g. black)

add shell script to launch the code

I find it very convenient to have a shell script taking care of

setting up the working directory of the process properly
activate any virtual environment as needed
fix the environment variables

e.g.

#!/bin/bash

# change the process directory to this one
cd $(dirname $0)

# activate a local virtual environment
source .venv/bin/activate

# set up some environment variables
export PYTHONPATH=$(pwd)

# launch the application
# forwards all command line arguments
python -m myapp $@

Exercise

transform the notebook you created and edited in the previous section (Exploratory Data Analysis) in an executable script

Coming back down to Earth

Create a Script from a Jupyter Notebook

extract code from Jupyter

manage magic code

add code to save tables

small size

large size

databases

add code to save figures

clean up the code

add shell script to launch the code

Exercise

Like this:

Related

You may also like...

Coming back down to Earth

Create a Script from a Jupyter Notebook

extract code from Jupyter

manage magic code

add code to save tables

small size

large size

databases

add code to save figures

clean up the code

add shell script to launch the code

Exercise

Share this:

Like this:

Related

You may also like...

Climbing a (binary) Tree

Python Intro

Data: the final frontier