
Split large text file into smaller files with equal number of lines

split -l 60 bigfile.txt prefix-

Loop through lines of file

while read line; do
    echo "$line";
done </path/to/file.txt

Use grep to find URLs from HTML file

cat urls.html | grep -Eo "(http|https)://[a-zA-Z0-9./?=_%:-]*"
  • grep -E: egrep
  • grep -o: only output what has been grepped
  • (http|https): either http OR https
  • a-zA-Z0-9: match all lowercase, uppercase, and digits
  • .: match period
  • /: match slash
  • ?: match ?
  • =: match =
  • _: match underscore
  • %: match percent
  • :: match colon
  • -: match dash
  • *: repeat the [...] group any number of times

Use Awk to print the first line of ps aux output followed by each grepped line

To find all cron processes with ps aux.

ps aux | awk 'NR<2{print $0;next}{print $0 | grep "cron"}' | grep -v "awk"
  • ps aux : equivalent to ps -aux. -a displays info about other users processes besides to current user. -u displays info associated with keywords user, pid, %cpu, %mem, vsz, rss, tt, state, start, time, and command. -x includes processes which do not have a controlling terminal. See man 1 ps.
  • awk 'NR<2{print $0;next}{print $0 | "grep cron"}' | grep -v "awk" : For number of input records (NR) less than 2, print the input record ($0), go to the next input record and repeat the {print $0} pattern until the end is reached, then execute the END rule. The End rule in this case is {print $0 | "grep cron"}, it prints the remaining input records after piping them through the "grep cron" command. This allows printing the first line of the ps aux output, which consists of the column labels, and filters out everything besides what you want to grep for (e.g. "cron" processes).
  • grep -v "awk" : avoids printing the line containing this command.