How to Organize a Self-contained Python Project

Harry Wang
5 min readFeb 5, 2020
Photo by Kelli McClintock on Unsplash

I often use and study open-source code from the Internet and run into the following situation:

I found a nice code repo on Github, clone it to my local computer, and then have to spend a lot of time just trying to get the code up and running because I have to figure out things like Python version, the required packages with specific versions, Jupyter configurations, etc.

In this short tutorial, I show how I organize a self-contained Python project, which can be up and running with minimal effort. I use a data science project with Jupyter as an example and store it on Github.

Prerequisite: you need to set up Python 3, Git, and (optional code editor) Atom: How to Setup Mac for Python Development

Key Steps:

  • Create a virtual environment
  • Create a requirement.txt file and specify required packages and versions
  • Install packages within the virtual environment and run Python programs

First, let’s create a new repo on Github: make sure you choose the Python .gitignore file (don’t know what this is: https://help.github.com/en/github/using-git/ignoring-files).

Clone the repo to your local Mac:

$ git clone https://github.com/harrywang/self-contained-project.git

Go inside the newly created folder $ cd self-contained-project/ and do the following:

Note: how to create and activate the virtual environment on Windows 10 is different, please refer to How to Setup Python 3 on Windows 10

  1. Create a Python 3 virtual environment in a folder named “venv”: $ python3 -m venv venv
    Note that you can change the virtual environment name to anything you like but using “venv” is a convention, which has been included in the default Github .gitingore file so that you won’t accidentally push the virtual environment folder to Github.
  2. Activate the virtual environment: $ source venv/bin/activate
  3. Create a requirement.txt file and add the packages you need for the project.

Here I install Jupyter in the virtual environment, which saves the trouble of configuring a system-level Jupyter with different virtual environments. The drawback of doing this is adding about 200M more to the virtual environment and making package installation time slightly longer. However, if you prefer to install Jupyter once at the system level and configure it to use different virtual environments (it’s perfectly fine that way), just remove jupyter from the requirements.txt file and follow the steps at the end of this tutorial.

I also added another package pandas for data analysis. Next, you can install jupyter and pandas using: $ pip install -r requirements.txt and you should see the following output in the terminal, where you can find the versions of jupyter and pandas you just installed.

If you don’t specify package version information in the requirements.txt file, the latest versions will be installed, which may or may not be what you want — package development is always evolving and your code may break for future versions. Therefore, it is a good practice to explicitly specify the version information as follows. You can add other packages in a similar way.

4. Start Jupyter to work on your project: $ jupyter notebook:

5. You can now save the project, commit all changes, and push the code to Github:

$ git add .
$ git commit -am ‘finished the tutorial’
$ git push

Now, you have created a highly portable and self-contained Python project. To sum up, anyone who has Python3 can get the code up and running by simply doing the following:

$ git clone https://github.com/harrywang/self-contained-project.git
$ cd self-contained-project
$ python3 -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ jupyter notebook

How to configure Jupyter with different virtual environments

Assume you have Jupyter installed (How). Follow the following steps to set up a virtual environment in Jupyter:

  • go to your project folder and create a virtual environment named “venv”
  • activate the virtual environment
  • install ipykernel in the virtual environment
  • install the current virtual environment as a kernel for Jupyter. NOTE: you have to use a unique name for each project.
$ python -m venv venv
$ source projectname/bin/activate
(venv) $ pip install ipykernel
(venv) $ ipython kernel install --user --name=unique_project_name
  • List all kernels and remove a virtual environment if needed
$ jupyter kernelspec list
$ jupyter kernelspec uninstall unique_project_name

References:

--

--