Tuesday, October 17, 2017

Customize your CERN SWAN Python analysis

The CERN SWAN service is quite recent ,but it is very promising and it is attracting a lot of attentions, for good reasons. I personally started to use it  to share some analysis based on pandas and matplotlib, within a Jupyter notebook. Most of the SWAN settings will generally allow that, but I want to provide here some useful tricks I have tested to improve the SWAN setups normally available.

Just to introduce the issue, I was running the analysis on my laptop, using Python 3 and recent versions of the python libraries. When I had the need to share my results with colleagues,  it was indeed natural to move the notebook in SWAN, immediately encountering a small issues: the setup that is provided, even the most recent one, had older versions of the libraries, not matching the ones I had on my laptop. As solution I may have back-ported my code, but this didn't sound right and the most recent patplotlib has better colors... not scientific but I liked.

The example below shows the default versions of relevant packages distributed with the most recent (at the time of this post) software version.


When opening SWAN, the user can choose the software stack on top which the analysis will run. The stack exploits the LCG project, when a specific stack is select to start the SWAN session, all the existing packages in the stack will be available. However SWAN expects you may special settings, allowing to run a shell script (using bash) to provide specific setups.

In the following part of the documents I will show how this feature can be exploited to run SWAN using a set of custom packages, able to extend the LCG software stack.

Going to the step by step instructions, the procedure I followed requires 3 steps:

Step 1: open a shell with the basic notebook configuration


As first step login into SWAN and stat a session using the Software stack  that you prefer.

When the list of notebooks appears, open a new Terminal, using the New drop-down menu on the top-right corner of the list.

This will open a Linux console that uses your EOS home folder as base path, with an environment equivalent to the notebooks you would open in during this session. This is a trick to have access eos from a shell, avoiding to log in lxplus and repeat the environment setup. 

Step 2: install your packages


From the shell you can now use pip (pip3 is you are using python 3) to install locally packages that are not included in the stack or newer versions. To do that you should create a directory, in example under your EOS home directory, and install the packages there as follow:
$ mkdir customenv
$ pip3 install -I --prefix customenv 

pandas 

You can find additional documentation about pip commands and options in the manpage, but in short:

  • --prefix: asks pip to install the packages under a specific directory and not in the directory where the python environment is installed in the system, where you don't have permission.
  • -I (optional): when installing a specific package other can be requested. This option will ignore the versions in the system and install the most recent version in the local path. 

You should continue installing all the packages you need and that aren't provided with LCG or aren't at the version you want.

Step 3: provide the setup script and restart the session

After you install all the required packages you can now provide a setup script for the future session. This script can be like the following one:
export PATH=$HOME/customenv/bin:$PATH
export PYTHONPATH=$HOME/customenv/lib/python3.5/site-packages:$PYTHONPATH
This shell script can be saved as customenv.sh in your home directory. To create the file you can use the jupyter interface, asking to create a new text file or any editor you like, available in the shell you used to install the new packages.

You are now ready to stop this session, going back to the control panel, and restart a new session. When restarting the new session you should now remember to set the Environment script field. Assuming you used the suggestion in this example the value should be: $CERNBOX_HOME/customenv.sh

The notebooks you will during this session will now combine the normal stack environment with the custom packages you installed, potentially providing you a a better experience.

To show the result, this is the screenshot of the same portion and notebook showed at the begin of the post, where showing the version of some packages, as you can it was possible to use most recent versions of all of them.

No comments:

Post a Comment