“The good news about computers is that they do what you tell them to do. The bad news is that they do what you tell them to do.” - Ted Nelson
In the Section 12, we have covered some basic Linux commands. Now we are going to take the commands we repeat frequently and save them in files so that we can re-run all those operations again later by typing a single command. For historical reasons, a bunch of commands saved in a file is usually called a shell script, but these are actually small programs.
Let’s start by going to data_demo/data/pdb
. Suppose we
want to print line 3-8
of menthol.pdb
, one way
to do this is to run:
Let’s copy the following line to a new file print.sh
to
create a shell script. Notice we are not running it as a command yet -
we just put the commands in a file.
Once we have saved the file, we can ask the shell (BASH) to execute
the commands it contains. Because our shell is called bash
,
so we run the following command:
Or change the file permission to 750
:
And then run it:
As you will find, the script’s output is exactly what we would get if we ran that pipeline directly.
What if we want to select lines from an arbitrary file? We could edit
print.sh
each time to change the filename, but that would
probably take longer than typing the command out again in the shell and
executing it with a new file name. Instead, let’s edit
print.sh
and make it more useful.
Now, replace the text menthol.pdb
with the special
variable called $1
:
Inside a shell script, $1
means “the first filename (or
argument) on the command line”. We can now run our script like this:
Or use a different file as the argument:
Currently, the shell script print.sh
takes only one
argument. We can modify the script to use more command-line arguments.
After the first command-line argument ($1
), each additional
argument that we provide will be accessible via the special variables
$1
, $2
, $3
, which refer to the
first, second, third command-line arguments, respectively. For example,
let’s change the content of print.sh
to be:
Can you guess what it does? Check the output like:
So you can see that, by changing the arguments to our command, we can change our script’s behavior.
The print.sh
works fine, but it may take the next person
who reads it a moment to figure out what it does. We can improve our
script by adding some comments at the top:
# Select lines from the middle of a file.
# Usage: bash print.sh filename end_line num_lines
head -n "$2" "$1" | tail -n "$3"
A comment starts with a #
character and runs to the end
of the line. The computer ignores comments, but they’re invaluable for
helping people (including yourself in the future!) understand and use
scripts. Each time you modify the script, make sure you check that the
comment is still accurate. An explanation that sends the reader in the
wrong direction is worse than none at all.
The notes above are modified from the excellent online tutorial freely available on the Software Carpentry website.
Open the user manual of TaiYi; the instructor will go over it in Chinese.
Python
scriptHere we use a simple Python
script to demonstrate how to
submit a job to TaiYi. The script goes like:
# Import modules
import random
# Set an integer
N = 10
# For loop with range, starting from 0
for i in range(0,N):
temp = random.randint(0,10)
print(temp)
# Exit with a message
print('This is done with TaiYi!')
Copy the above lines to a new Python
script
t1.py
, and save it to your home directory.
TaiYi uses IBM load sharing
facility (LSF) for job management. Put is simply, you need to
prepare a LSF job file that clearly describes the job, including
computational resources, libraries, log files, commands, executable
files, and even more options or requirements. In our case, the LSF
(job.sh
) is written as:
#!/bin/sh
#BSUB -J demo ## job name
#BSUB -q debug ## queue name
#BSUB -n 1 ## number of total cores
#BSUB -W 00:10 ## walltime in hh:mm
#BSUB -e err.log ## error log
#BSUB -o job.log ## job log
# Load Python 3.7
module load python/3.7.0
# Run the script and save output to result.log
echo "--------------------"
echo "RUN t1.py:"
python3 t1.py >> result.log
#!/bin/sh
tells TaiYi this is a shell
script.
#BSUB
stands for bsub
with different
opinions as follow:
-J test
means the job name is
demo
.
-q debug
means we will submit this job to the
debug
queue.
-n 1
means we ask for 1
core (CPU) to
run our job.
-W 00:10
means we ask the computational resource for
00
hours and 10
minutes.
-e err.log
means the error message (if any) will be
written to err.log
.
-o job.log
means the job information will be written
to job.log
.
module load python/3.7.0
is to load
Python 3
on TaiYi.
Finally, we run t1.py
with
python3 t1.py
, and save the output to
result.log
.
Submit the LSF job job.sh
to TaiYi:
Once the job is submitted, it will be assigned with a job ID (e.g.,
2622287
). Notice that the job may not be run immediately,
depending on the availability of computational resources. You can check
the status of a job with:
Here <JOB ID>
is ID of a submitted job (e.g.,
2622287
).
Go to data_demo/data/pdb
, create a shell script
count.sh
with one argument to print the total line numbers
of a given file.
Write a shell script called longest.sh
that takes the
name of a directory and a filename extension as its arguments, and
prints out the name of the file with the most lines in that directory
with that extension.
[Hint: You may find wc
,
sort
, head
, or tail
useful.]
Write a Python
script t2.py
to generate two
lists of numbers and return the average of each list. Submit and run it
on TaiYi.