Jupyter notebooks1 are mainly used for machine learning related tasks that contains rich text based contents and can run python codes by blocks. As for machine learning, functions provided in various libraries are well developed and one doesn’t need to consider too much on debugging, thus the reusable code blocks are much easier and straightforward to rerun a few lines of code by changing parameters.
In this tutorial, installation of machine learning environment, server side and client side setup are described which rely on ssh tunneling knowledge, such techniques can be used for other user needs. In addition, on the client side, the advantage of using Visual Studio Code is also shown so users can have an alternative more convenient approach to work on their machine learning projects.
Installation
Installation for Jupyter is quite easy, there are multiple ways to do so, one straightforward way aside from common package managers is to install Anaconda or Miniconda which is a mini version of Anaconda. Anaconda provides a python environment that specifically targets on Machine Learning tasks.
In addition to install Miniconda via installers2, you could install it using command line. To install Miniconda3 via command line, one could run the following commands for Linux or Mac with Apple Silicon based distros:
mkdir -p ~/miniconda3
# Determine the OS platform
OS=$(uname)
# Set the URL for the Miniconda installer
# You may need to update these URLs to the latest Miniconda installer versions
LINUX_URL="https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh"
MAC_URL="https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh"
# Download the appropriate installer
case $OS in
Linux*)
echo "Detected Linux. Downloading with wget..."
wget $LINUX_URL -O ~/miniconda3/miniconda.sh
;;
Darwin*)
echo "Detected macOS. Downloading with curl..."
curl -L $MAC_URL -o ~/miniconda3/miniconda.sh
;;
*)
echo "Unsupported OS: $OS"
exit 1
;;
esac
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
~/miniconda3/bin/conda init zsh
Anaconda or Minionda packages are normally installed on the home directory as you can see inside the code block.
Install packages
As the environment has been installed, there’s still one step from using the jupyter notebook, one needs to install certain packages onto the conda environment.
Installing packages is quite easy to do, just use the python pip package manager provided in the conda installation directory, so packages like sklearn, matplotlib, numpy and pandas can be installed quite easily.
~/miniconda/bin/pip3 install sklearn matplotlib numpy pandas
Configuration
To setup the jupyter command, you need to run the following command jupyter notebook --generate-config
:
(base) [user@chinkapin ~]$ jupyter notebook --generate-config
Writing default config to: ~/.jupyter/jupyter_notebook_config.py
Several parameters could be configurated in jupyter_notebook_config.py
:
c.NotebookApp.notebook_dir = 'your working directory'#For the default opening directory
c.NotebookApp.port = 60000#setup the port number while default is 8888
c.NotebookApp.open_browser = False#if jupyter is setup at the server, disable open broswer when app launches
c.NotebookApp.token = ''#disable token authentification which is not recommended
A copy of the configuration file can be accessible here4.
Server Side Execution
If you spend some time to setup the config file, it is easy to run the jupyter command at the server side. It is more recommended to run with nohup
, this will allow jupyter to run in the background uninterrupted. But if you have admin privilege, you can setup a system service, which is much easier.
nohup jupyter notebook &
macOS
You could also run a service for jupyter on you Mac as a service, A plist file named “org.jupyter.notebook.plist
” should be placed at $HOME/Library/LaunchAgents
directory with the conents as follows:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>org.jupyter.notebook</string>
<key>ProgramArguments</key>
<array>
<string>/opt/homebrew/bin/jupyter</string>
<string>lab</string>
<string>--no-browser</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardErrorPath</key>
<string>/tmp/org.jupyter.notebook.err</string>
<key>StandardOutPath</key>
<string>/tmp/org.jupyter.notebook.out</string>
</dict>
</plist>
By running the following line of code, you will have a local jupyter service that you could use whenever you want.
launchctl load $HOME/Library/LaunchAgents/org.jupyter.notebook.plist
Client Side Execution
It is a bit complex to setup the connection to server’s jupyter service, actually it needs a technique named ssh port forwarding or tunneling. I wrote another article regarding ssh tunneling among servers here.
ssh -L localhost:8080:localhost:60000 user@jupyter_server_ip
This sets up the connection between localhost’s 8080 port to remote server’s 60000 port so that we could have access to the service using browsers via the tunneling connection.
What if one wants to have a constant connection to the services, is it possible to do so? It turns out it is not that difficult to do, you will have constant connection to the service which reduces the friction when working on machine learning projects.
Linux
macOS
I found that this mac daemon may not work as expected as I need to restart the service for it to work properly, thus for the next step I need to wrap the commands into a python script that could help me restart the service automatically, however there’s communication between multiple processes which could be a potential problem.
Launchd is able to provide background services for tunneling or port forwarding connections, there’s an example5 about how to use launchd to create and maintain such connection. I also put one example as follows that would allow me to connect jupyter remove service without having to run command line each time. A plist file named “org.openssh.Remote_Jupyter.plist
” should be placed at $HOME/Library/LaunchAgents
directory with the conents as follows:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<dict>
<key>NetworkState</key>
<true/>
</dict>
<key>Label</key>
<string>org.openssh.Remote_Jupyter</string>
<key>LimitLoadToSessionType</key>
<string>Aqua</string>
<key>ProgramArguments</key>
<array>
<string>/usr/bin/ssh</string>
<string>-L 8000:localhost:60000</string>
<string>-N</string>
<string>-n</string>
<string>-C</string>
<string>-o ControlMaster=no</string>
<string>-i</string>
<string>ssh_key_location</string>
<string>user@remote_jupyter_server</string>
</array>
<key>StandardErrorPath</key>
<string>/tmp/org.openssh.Remote_Jupyter.err</string>
<key>StandardOutPath</key>
<string>/tmp/org.openssh.Remote_Jupyter.out</string>
</dict>
</plist>
The service will be in effect by the following command:
launchctl load $HOME/Library/LaunchAgents/org.openssh.Remote_Jupyter.plist
Windows
Perhaps this is the most difficult one to setup, you need to install an app named nssm6, use this link if you need to use that on Win10 or newer OS. You need to setup the PATH environment variable by including the path to the nssm executable.
To my surprise, after months of usage experience, this method works really well as long as there’s a jupyter notebook service on running on the server side, so you could confidently use this method to configurate the port forwarding services on your Windows machine.
A service can be created by nssm using the following command:
nssm install tunnel
nssm will pop out the GUI for the configuration, for the Application tab: it is easy, just put the following info to the GUI:
- Path:
C:\Windows\System32\OpenSSH\ssh.exe
- Startup Directory:
C:\Windows\System32\OpenSSH\
- Arguments:
-N -n -T -L localhost:8000:localhost:60002 -i "identity" user@server
This is one of the most important step to be cautious about: at Logon Tab you need to select Log on as “This Account” and specify your current username and input the password.
You could set additional path for stdin, stdout and stderr, but they are optional.
Finally when the service is properly configurated, the following command will be executed to start the service.
nssm restart tunnel
Python Daemon
It seems that only a ssh command is not versatile enough for me to achieve everything I want, I want to achieve the goal of auto-connection, trigger remote Jupyter service when not having the admin privilege. As Python provides a lot of modules to make the automation goal available. What’s even better, Python is available on all the platforms, I demonstrated how to setup background services for ssh command, setting up a python script service is even easier, you only the directory of the Python interpreter and the Python script path.
#!/usr/bin/env python3
import paramiko,subprocess,multiprocessing,time
def create_ssh_tunnel(username, hostname, remote_port, local_port, ssh_key_path):
"""
Create an SSH tunnel using Paramiko.
"""
try:
client = paramiko.SSHClient()
client.load_system_host_keys()
client.set_missing_host_key_policy(paramiko.WarningPolicy)
client.connect(hostname, username=username, key_filename=ssh_key_path)
# Forward port 8888 (default Jupyter port) to a local port
tunnel = paramiko.SSHTunnelForwarder(
(hostname, 22),
ssh_username=username,
ssh_pkey=ssh_key_path,
remote_bind_address=('127.0.0.1', remote_port),
local_bind_address=('127.0.0.1', local_port)
)
tunnel.start()
print(f"Tunnel established at localhost:{local_port} -> remote:{remote_port}")
while True:
time.sleep(10)
except Exception as e:
print(f"Error creating SSH tunnel: {e}")
finally:
client.close()
tunnel.stop()
def check_and_start_jupyter(username, hostname, ssh_key_path):
"""
Check if Jupyter is running and start it if not.
"""
try:
client = paramiko.SSHClient()
client.load_system_host_keys()
client.set_missing_host_key_policy(paramiko.WarningPolicy)
client.connect(hostname, username=username, key_filename=ssh_key_path)
# Check if Jupyter is running
stdin, stdout, stderr = client.exec_command('pgrep -f "jupyter-notebook"')
if stdout.channel.recv_exit_status() != 0:
print("Jupyter is not running. Starting Jupyter...")
# Start Jupyter in the background using nohup
client.exec_command('nohup jupyter notebook &')
print("Jupyter started.")
else:
print("Jupyter is already running.")
except Exception as e:
print(f"Error: {e}")
finally:
client.close()
if __name__ == "__main__":
username = 'your_username'
hostname = 'remote_server_ip'
remote_port = 60000 # Remote Jupyter port
local_port = 8000 # Local port for SSH tunnel
ssh_key_path = '/path/to/your/ssh/key'
# Start Jupyter on the remote server if not running
check_and_start_jupyter(username, hostname, ssh_key_path)
# Create a process for the SSH tunnel
tunnel_process = multiprocessing.Process(
target=create_ssh_tunnel,
args=(username, hostname, remote_port, local_port, ssh_key_path)
)
tunnel_process.start()
# You can add more code here if needed
# ...
# Wait for the tunnel process
tunnel_process.join()
import paramiko
import multiprocessing
import time
# SSH credentials and server details
ssh_host = 'remote_server_ip'
ssh_port = 22
ssh_user = 'your_username'
ssh_password = 'your_password' # or use a key file
# Remote and local ports for Jupyter Notebook
remote_jupyter_port = 60000
local_jupyter_port = 8000
def check_jupyter_running(ssh):
"""Check if Jupyter is running on the remote server."""
stdin, stdout, stderr = ssh.exec_command("pgrep -f 'jupyter-notebook'")
return bool(stdout.read())
def start_jupyter(ssh):
"""Start Jupyter Notebook on the remote server."""
ssh.exec_command("nohup jupyter notebook &")
print("Jupyter Notebook started on the remote server.")
def create_ssh_tunnel():
"""Create an SSH tunnel for Jupyter Notebook."""
with paramiko.SSHClient() as ssh:
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(ssh_host, ssh_port, ssh_user, ssh_password)
if not check_jupyter_running(ssh):
start_jupyter(ssh)
# Set up the SSH tunnel
tunnel = paramiko.SSHTunnelForwarder(
(ssh_host, ssh_port),
ssh_username=ssh_user,
ssh_password=ssh_password,
remote_bind_address=('localhost', remote_jupyter_port),
local_bind_address=('localhost', local_jupyter_port)
)
tunnel.start()
print(f"Tunnel established at localhost:{local_jupyter_port}")
try:
while True:
time.sleep(10) # Keep the tunnel open
except KeyboardInterrupt:
print("Tunnel closed.")
tunnel.close()
if __name__ == '__main__':
# Run the SSH tunnel in a separate process
process = multiprocessing.Process(target=create_ssh_tunnel)
process.start()
# You can add more code here that runs in parallel
# For example, opening a web browser to the local Jupyter port
Visual Studio Code
VS code is widely used for developers and machine learning practitioners. It could be much easier to use VS Code on jupyter notebooks without considering the tunneling connection or maintain the jupyter process on the server side.
This link contains the description on how to use VS Code to connect to a remote server with better performance. This link contains the description on how to use jupyter notebooks on VS Code. By employing those 2 techniques, you can run your machine learning code on the remote server seamlessly on your personal device.
Conclusion
In this article, the basic workflow of utilizing jupyter notebook is laid out, I want to emphasize on the automation part of the setup like using configuration files as well as setting up system services depending on different operating systems. Once you spend sometime on the setup, you could connect to the service at anytime to run the machine learning algorithm projects.
Leave a Reply