opengeodata.de

Matera - Day 1

2017-06-19

Starting of the summer school in Matera; after the introduction we get to know Linux and bash or are trying to learn more. This was refreshing or new for me …

Session 1 - bash basics

pwd -> current dir  
man -k count -> search for command involving the keyword (-ks)  
cd ../.. -> go up two directories
& (at end of command) -> run program in background, keep terminal usable
fg -> will resume the most recently suspended or backgrounded job
ps -aux | grep evince - get PID for evince
CTRL + L - scroll to current command hiding everything
CTRL + A - go to beginning of a command
ll - same as ls -l
more - open a text file partially
!! - repeats the last command
du -hs * | sort -hr - list all directories sorting by size

PCManFM supports tab completion in the path.

Session 2 - bash basics

String manipulation

* - a string with o or more character -> ls /dev/tty*
? - a single character -> ls /dev/tty?
[ ] - one of a single character listed -> ls /dev/tty[2-4]
{} - one of a single string listed -> ls /dev/tty{zd,zc}

Misc

find /home/user -size +5M -name "*.pdf" | xargs du -sh
    find PDF files which are bigger than 5MB and display file size
seq 1 100 - generate sequence from 1 to 100 
grep "2002 06" input.txt - grep two columns in input.txt (searching for June 2002)

For-loop

var=$(grep "2007" input.txt | wc -l) - set a command result to a variable
for ((var=2005 ; var<=2007 ; var++)); do grep $var input.txt | wc -l || echo $var; done - simple for-loop
for var in $(seq 2005 2007); do grep $var input.txt | wc -l; done - same simple for-loop
for var in $(seq 2005 2007); do grep $var input.txt | echo $(wc -l) $(echo $var); done - same simple for-loop with printing the $var also

Session 3 - bash basics & AWK

AWK processes file in cascade mode - line for line. It is most useful for data reduction. Can be used for pre-processing (calculations before importing into other programs) as it is sometimes more efficient.

awk  '{ print $5 , $2 }' input.txt   # print a column 5 and 2 (space seperated)
awk  '{ print $5 "," $2 }' input.txt   # print a column 5 and 2 (comma seperated) 
awk  '{ print NF }'  input.txt       # print number of columns (count)
awk  '{ print NR }' input.txt        # print number of rows (count)

awk  '{ print substr($1,1,4) }' input.txt # string manipulation

awk  -v # import variable in awk query

Associative array as a powerful concept.

 awk '{ Year[$2]++; } END { for (var in Year) print var, Year[var]," data points"}' input.txt

Further reading on sed to be done (in case of string-only operations).