Thursday, March 26, 2015

Bash CHR program

Bash script that takes ascii code value - character, hexadecimal or octal - and returns corresponding representation:
Usage: chr.bsh [-D] <-d> <-h|-x> <-b> [-H] <char> [<-a> <code>]
 
       <char>   A valid ascii code value: character, decimal,
                hexadecimal, octal (0o000 or \000) 
OPTIONS:
          -d    Convert character to decimal 
          -a    Convert code (hex, octal) to ascii character 
       -h|-x    Convert character to hexadecimal 
          -b    Convert character to binary 
          -o    Convert character to octal 
          -H    Show help message
          -D    Turn on debug mode
          -?    Show usage
      
      ## Options may be duplicated and compounded in any order       
      $ chr.bsh A -d
        65
      $ chr.bsh A -d -b
        01000001
      $ chr.bsh A -d -b -x
        0x41
      $ chr.bsh A -d -b -x -o
        0o101
      $ chr.bsh A -d -b -x -o -b
        01000001
      $ chr.bsh A -d -b -x -o -b -h
        0x41
      $ chr.bsh A -d -b -x -o -b -h -a
        A
      ## Experimentally - it supports pipes and redirects! 
      $ echo A | chr.bsh -d -b
        01000001
      $ chr.bsh -d -b -x <<< A
        0x41
      ## Multiple characters from a string!  
      $ echo DEF | chr.bsh -d -b
        01000100
        01000101
        01000110
      $ chr.bsh -d -b -o <<< XYZ  | chr.bsh -a
        XYZ

https://github.com/AdamDanischewski/chr.bsh

Wednesday, March 25, 2015

Seed /dev/random Instead of Wasting Data on /dev/null

Instead of throwing data that you don't have any other use for into the proverbial bit grinder of /dev/null, why not redirect that data to /dev/random and seed the random data entropy pool?

This excerpt is from the IBM Knowledge Center regarding AIX - but it works similarly on Linux and other Unix variants - if you know better please leave a comment, thanks.

Data written to either of the random devices will also contribute to the pool of stored random input and can influence the output, thus writing to these devices should be a privileged operation.    urandom and random Devices

I took a quick backup of my data and then changed all of my scripts in one simple sed:

sed -i 's@/dev/null@/dev/random@g' * 

I did this a few months ago and I felt quite a relief about it, I still recall how relieved I felt at making that change.

Sunday, March 22, 2015

Recursive Awk - Print size in human readable form

num2h.awk
function human(x) {
  x[1]/=1024; 
  if (x[1]>=1000) { 
      x[2]++; human(x); 
  }
}

 # main   
{  a[1]=$1; a[2]=1; 
   human(a); 
   print a[1],substr("kMGTEPZY",a[2],1) 
}

## alias num2h='awk -f /path/to/num2h.awk <<<'

## >$ num2h 500000
## 488.281 k
## >$ num2h 500000000
## 476.837 M
## >$ num2h 5000000000
## 4.65661 G

Wednesday, March 18, 2015

1980 Porter Stemmer in Awk

################################################################################ 
#  This is the Porter stemming algorithm, coded up in awk by Gregory Grefenstette
#  July 5, 2012
#  It follows the  algorithm presented in
#
#  Porter, 1980, An algorithm for suffix stripping, Program, Vol. 14,
#  no. 3, pp 130-137,
#
# and more precisely the code for the ANSI C version found at
#
#  http://www.tartarus.org/~martin/PorterStemmer
#
# This endioding of the algorthm can be used free of charge for any purpose
#
# TRUE if last two characters are a double consonant
#
# Temporary Modification 20150318 by AMDanischewski 
#  -- Changed step1ab to also clip -er and -est, this breaks many words 
#  but it fixes more than it breaks, so I put it on there. Prior to this 
#  words like lighter and lightest won't be stemmed.   
#
# Issues: -er, -est are not stemmed properly 
#         words ending in -y are brokenly returned as ending -i 
# 
################################################################################ 

function doublec(s)
{ if(substr(s,length(s),1) == substr(s,length(s)-1,1) && (substr(s,length(s),1)  !~ /[aeiou]/)) return 1;
                                                         else return 0 }

# antyhing other than one of a, e, i, o, u, for the case of "y" it checks if
# there is a preceding vowel, in which case it is considered a consonant

function cons(str,i)
{ if(substr(str,i,1) ~ /[aeiou]/) return 0 ;
  if(i==1) return 1;
  if(substr(str,i,1) == "y") { if(substr(str,i-1,1)  ~ /[aeiou]/) return 1; else return 0 }
  return 1
}

# cvc(i) is TRUE if last three characters of str are  consonant - vowel - consonant
# and also if the second c is not w,x or y. this is used when trying to
#   restore an e at the end of a short word. e.g.
#
#  cav(e), lov(e), hop(e), crim(e), but
#  snow, box, tray.

function cvc(str) {
  if(length(str) <= 2) return 0;
  if( str ~ /[wxy]$/ ) return 0;
  if ( cons(str,length(str)-2) && !cons(str,length(str)-1) &&  cons(str,length(str)) ) return 1;
  return 0
}

# m() measures the number of consonant sequences between k0 and j. if c is
# a consonant sequence and v a vowel sequence, and <..> indicates optional
#
#      <c><v>       gives 0
#      <c>vc<v>     gives 1
#      <c>vcvc<v>   gives 2
#      <c>vcvcvc<v> gives 3
#      ....
# this version returns "2" as the maximum value

function m(str) {
  # skip initial consonants
  mreturns=0;
  mindex=1;
  while((mindex <= length-str) && cons(str,mindex)) mindex++ ;
  while (1)
    { while(1)
        { if (mindex > length(str)) return mreturns;
          if( cons(str,mindex) ) break;
          mindex++
        }
      mindex++;
      mreturns++;
      if(mreturns > 2) return mreturns;
      while(1)
        { if (mindex > length(str)) return mreturns;
          if( ! cons(str,mindex) ) break;
          mindex++
        }
    }
}

# step1ab() gets rid of plurals and -ed or -ing. e.g.
#
#      caresses  ->  caress
#      ponies    ->  poni
#      ties      ->  ti
#      caress    ->  caress
#      cats      ->  cat
#
#      feed      ->  feed
#      agreed    ->  agree
#      disabled  ->  disable
#
#      matting   ->  mat
#      mating    ->  mate
#      meeting   ->  meet
#      milling   ->  mill
#      messing   ->  mess
#
#      meetings  ->  meet
function step1ab(str) {

if(str ~ /sses$/ || str ~ /ies$/ ) str=substr(str,1,length(str)-2) ;
else if(str ~ /ss$/) ;
else if (str ~ /s$/) str=substr(str,1,length(str)-1) ;

if(str ~ /eed$/) { if(m(substr(str,1,length(str)-3))>0 ) str=substr(str,1,length(str)-1) ; }
else {trunc=0;
    if (str ~ /[aeiouy].*ed$/) trunc=2;
      ## Begin temporary modification by AMDanischewski - 20150318  
      ##  Added the following two entries to fix est and er.  
      ##  Without this words like lightest and lighter won't be stemmed. 
      ##  So this needs to be tweaked furth =) .. 
    if (str ~ /[aeiouy].*er$/) trunc=2;   
    if (str ~ /[aeiouy].*est$/) trunc=3; 
    if (str ~ /[aeiouy].*ing$/) trunc=3 ;
    if(trunc>0) { str=substr(str,1,length(str)-trunc) ;
    if(str ~ /(at|bl|iz)$/) str=str"e" ;
    else
     if (doublec(str)==1 && (str !~ /[lsz]$/)) {  str=substr(str,1,length(str)-1); }
     else if( m(str)==1 && cvc(str)) str=str"e";
    }}
    return str }

# step1c() turns terminal y to i when there is another vowel in the stem.

function step1c(str) {
if(str ~/[aeiouy].*y$/) str=substr(str,1,length(str)-1)"i" ;
 return str }

# step2() maps double suffices to single ones. so -ization ( = -ize plus
#   -ation) maps to -ize etc. note that the string before the suffix must give
#           m() > 0. */

function step2(str) {
  if( str ~ /[aeiouy][^aeiouy].*ational$/ ) str=substr(str,1,length(str)-5)"e" ;
  else if ( str ~ /[aeiouy][^aeiou].*tional$/ ) str=substr(str,1,length(str)-2) ;
  else if ( str ~ /[aeiouy][^aeiou].*[ae]nci$/ ) str=substr(str,1,length(str)-1)"e" ;
  else if ( str ~ /[aeiouy][^aeiou].*izer$/ ) str=substr(str,1,length(str)-1) ;
  else if ( str ~ /[aeiouy][^aeiou].*bli$/ ) str=substr(str,1,length(str)-1)"e" ;
  else if ( str ~ /[aeiouy][^aeiou].*alli$/ ) str=substr(str,1,length(str)-2);
  else if ( str ~ /[aeiouy][^aeiou].*entli$/ ) str=substr(str,1,length(str)-2);
  else if ( str ~ /[aeiouy][^aeiou].*eli$/ ) str=substr(str,1,length(str)-2);
  else if ( str ~ /[aeiouy][^aeiou].*ousli$/ ) str=substr(str,1,length(str)-2);
  else if ( str ~ /[aeiouy][^aeiou].*ization$/ ) str=substr(str,1,length(str)-5)"e" ;
  else if ( str ~ /[aeiouy][^aeiou].*ation$/ ) str=substr(str,1,length(str)-3)"e" ;
  else if ( str ~ /[aeiouy][^aeiou].*ator$/ ) str=substr(str,1,length(str)-2)"e" ;
  else if ( str ~ /[aeiouy][^aeiou].*alism$/ ) str=substr(str,1,length(str)-3);
  else if ( str ~ /[aeiouy][^aeiou].*iveness$/ ) str=substr(str,1,length(str)-4) ;
  else if ( str ~ /[aeiouy][^aeiou].*fulness$/ ) str=substr(str,1,length(str)-4) ;
  else if ( str ~ /[aeiouy][^aeiou].*ousness$/ ) str=substr(str,1,length(str)-4) ;
  else if ( str ~ /[aeiouy][^aeiou].*aliti$/ ) str=substr(str,1,length(str)-3) ;
  else if ( str ~ /[aeiouy][^aeiou].*iviti$/ ) str=substr(str,1,length(str)-3)"e" ;
  else if ( str ~ /[aeiouy][^aeiou].*biliti$/ ) str=substr(str,1,length(str)-5)"le" ;
  else if ( str ~ /[aeiouy][^aeiou].*logi$/ ) str=substr(str,1,length(str)-1) ;
 return str }

# step3() deals with -ic-, -full, -ness etc. similar strategy to step2.
function step3(str) {
  if( str ~ /[aeiouy][^aeiouy].*icate$/ ) str=substr(str,1,length(str)-3) ;
  else if ( str ~ /[aeiouy][^aeiou].*ative$/ ) str=substr(str,1,length(str)-5);
  else if ( str ~ /[aeiouy][^aeiou].*alize$/ ) str=substr(str,1,length(str)-3) ;
  else if ( str ~ /[aeiouy][^aeiou].*iciti$/ ) str=substr(str,1,length(str)-3) ;
  else if ( str ~ /[aeiouy][^aeiou].*ical$/ ) str=substr(str,1,length(str)-2) ;
  else if ( str ~ /[aeiouy][^aeiou].*ful$/ ) str=substr(str,1,length(str)-3);
  else if ( str ~ /[aeiouy][^aeiou].*ness$/ ) str=substr(str,1,length(str)-4);
return str }

# step4() takes off -ant, -ence etc., in context <c>vcvc<v>.
function step4(str) {
  if( str ~ /al$/ ) { if ( m(substr(str,1,length(str)-2)) > 1 )  str=substr(str,1,length(str)-2) }
  else if ( str ~ /[ae]nce$/ ) { if ( m(substr(str,1,length(str)-4)) > 1) str=substr(str,1,length(str)-4) }
  else if ( str ~ /(er|ic)$/ ) { if (  m(substr(str,1,length(str)-2)) > 1 ) str=substr(str,1,length(str)-2) }
  else if ( str ~ /[ai]ble$/ ) { if (  m(substr(str,1,length(str)-4)) > 1 ) str=substr(str,1,length(str)-4) }
  else if ( str ~ /ant$/ ) { if (  m(substr(str,1,length(str)-3)) > 1 )  str=substr(str,1,length(str)-3) }
  else if ( str ~ /ement$/) { if(  m(substr(str,1,length(str)-5)) > 1 ) str=substr(str,1,length(str)-5) }
  else if ( str ~ /ment$/) { if (  m(substr(str,1,length(str)-4)) > 1 ) str=substr(str,1,length(str)-4) }
  else if ( str ~ /ent$/) { if (  m(substr(str,1,length(str)-3)) > 1 ) str=substr(str,1,length(str)-3) }
  else if ( str ~ /[st]ion$/) { if (  m(substr(str,1,length(str)-3)) > 1 )  str=substr(str,1,length(str)-3) }
  else if ( str ~ /ou$/) { if (  m(substr(str,1,length(str)-2)) > 1 )  str=substr(str,1,length(str)-2) }
  else if ( str ~ /(ism|ate|iti|ous|ive|ize)$/) { if (  m(substr(str,1,length(str)-3)) > 1 )  str=substr(str,1,length(str)-3)
}
return str}

# step5() removes a final -e if m() > 1, and changes -ll to -l if
#  m() > 1.
function step5(str) {
  if ( str ~ /e$/ && ( m(str)>1 || (m(str)==1  && !cvc(substr(str,1,length(str)-1)))))  str=substr(str,1,length(str)-1) ;
  if( str ~ /ll$/ && m(str)>1 ) str=substr(str,1,length(str)-1) ;
 return str }

function stem(str)
{ str=tolower(str);
  if(length(str)<=2) return str;
  str=step1ab(str);
  str=step1c(str);
  str=step2(str);
  str=step3(str);
  str=step4(str);
  str=step5(str);
return str
}

# main
{  printf("%s",stem($1));
   for(i=2;i<=NF;i++) printf("%s%s",FS,stem($i));
   print ""
}
If you save the previous to a file in your path you can add an alias like this:
alias stem='awk -f /path/to/stem.awk <<< '

Now you should be able to stem any word like this: 
~$ stem shopping
shop

Monday, March 16, 2015

Who needs "Toilet", A Simple Bash Colorizer

#!/usr/bin/env bash 

## A.M.Danischewski 2015+(c) Free - for (all (uses and 
## modifications)) - except you must keep this notice intact. 

declare INPUT_TXT=""
declare    ADD_LF="\n" 
declare -i DONE=0
declare -r COLOR_NUMBER="${1:-247}"
declare -r ASCII_FG="\\033[38;05;"
declare -r COLOR_OUT="${ASCII_FG}${COLOR_NUMBER}m"

function show_colors() { 
   ## perhaps will add bg 48 to first loop eventually 
 for fgbg in 38; do for color in {0..256} ; do 
 echo -en "\\033[${fgbg};5;${color}m ${color}\t\\033[0m"; 
 (($((${color}+1))%10==0)) && echo; done; echo; done
} 

if [[ ! $# -eq 1 || ${1} =~ ^-. ]]; then 
  show_colors 
  echo " Usage: ${0##*/} <color fg>" 
  echo "  E.g. echo \"Hello world!\" | figlet | ${0##*/} 54" 
else  
 while IFS= read -r PIPED_INPUT || { DONE=1; ADD_LF=""; }; do 
  PIPED_INPUT=$(sed 's#\\#\\\\#g' <<< "${PIPED_INPUT}")
  INPUT_TXT="${INPUT_TXT}${PIPED_INPUT}${ADD_LF}"
  ((${DONE})) && break; 
 done
 echo -en "${COLOR_OUT}${INPUT_TXT}\\033[00m"
fi 
Thanks to FloZz' tutorial on ascii colors, show_colors() logic.

Friday, March 13, 2015

Geany custom command - Parenthesize selected text

Here is quick little bash script that accepts piped input from STDIN and parenthesizes it.

I use it for a Geany Custom Command, to set it up from the Geany IDE:
Choose: -> Edit -> Format -> Send Selection To -> Set Custom Commands -> Add

Then put the full path of this script for the Command and provide a Label: Parenthesize.

If it is the first command it will automatically have the keybinding of Control+1, so now
whenever you select any block of text you can simply hit Control+1 and the selected text will be automatically parenthesized for you.

#!/usr/bin/env bash 
declare INPUT_TXT=""
declare ADD_LF="\n" 
declare -i DONE=0

while read PIPED_INPUT || { DONE=1; ADD_LF=""; }; do 
 INPUT_TXT="${INPUT_TXT}${PIPED_INPUT}${ADD_LF}"
 ((${DONE})) && break; 
done

echo -en "(${INPUT_TXT})"

Sunday, March 8, 2015

Bash Random ID Generator - Based on mouse input, time, /dev/urandom, dynamic/static text


Description: gen_uniq_id.bsh

This program generates a random md5sum based on the current time to nanoseconds 
and collected mouse movement data/random data over .25 seconds or to an 
user-specified floating point timeframe. 

This program requires (to work to full capacity) by default:
md5sum, timeout, xinput, /dev/urandom

This program is self-modifying and intentionally sensitive to line number changes.

This program generates a random number based on the current time to the
nanosecond (date +%Y%m%d%H%M%S%N), combined with by default .25
seconds of mouse data combined with .25 seconds of random data.

This script currently outputs a random md5sum of a string consisting of
the current time, a block of text, mouse movement data and random data.

The default mouse device id currently set is 11, the
default this script "ships" with is 11 this is likely wrong for your
system so you will need to change it to the appropriate value. You may
set a new default value using the -M option.  

The appropriate mouse device id to use can be determined by
running: xinput --list

The default random device id currently set is /dev/urandom, the
default the script "ships" with is /dev/urandom which usually exists on
most `*`nix boxes. If you would like to use a random device other than
the current default you may set a new default value using the -R option. 

The default text currently set is:
"I AM A SOVEREIGN EVERLASTING SENTIENT FROM THE NUMBER LINE WITHOUT A CREATOR" 

You can also supply your own text statement dynamically with the
-s option, this allows for further random seeding opportunity.

If you like to use a different statement text you may set a new default
value using the -S option. 

WARNING: -M, -R, -S options require that ${0} (now set to ./gen_uniq_id.bsh) refer
to the script, either a fully qualified path /favorite/place/for/${0}
or ${0} to refer to the script you wish to modify that is in your
current directory. These options require further that you have write
permission on ${0}.

The -M, -R, -S options should only need to be issued VERY INFREQUENTLY,
once they are run the values become the new DEFAULT values. If you have
modified this script at all, or you think for any reason the line
numbers have changed then these options will likely fail.

It is recommended that you make a copy to a sandbox directory and test 
it there before running it on your main copy of this script. The script
does make a backup of the original script but the backup will get
clobbered from repeated use. If these options are too scary, then I
recommend you modify this script and comment out the logic in the case 
statement of the option handler. From there you can modify this script
manually when necessary.

Usage: gen_uniq_id.bsh <-h> <-d> <-m=[0.00..]> <-r=[0.00..]> <-s="YOUR QUOTE">

OPTIONS:
      -h           Show this message
      -d           Turn on debug mode
      -m=0.00..    Mouse data collection time, 0 turns off mouse data,
                   any float turns on collecting mouse movement data
                   for the random generated. Default is (.25 seconds)
      -r=0.00..    Random data collection time, 0 turns off random data
                   collection, any float turns on collecting random data
                   for the duration time in seconds for the random
                   generated. Default is (.25 seconds)
      -s           String of text to incorporate into the random md5sum
      -M=0-20..    Change default value of mouse device id (that
                   corresponds to xinput --list mouse device id)
      -R="/dev/.." Change default value of the random device to use
      -S="U QUOTE" Change default value of the STATEMENT STRING
     
  E.g.
        ## To generate a random md5sum based on random data collected 
        ## over .5 seconds and 1 second of collected mouse movement data
      $ gen_uniq_id.bsh -r.5 -m1

        ## To generate a random md5sum based on random data collected 
        ## over 0.004353 seconds without any collected mouse movement data
      $ gen_uniq_id.bsh -r".000004353354e+03" -m0

        ## To turn off mouse collection, and random collection and 
        ## seed with the last 5 commands from history w/debug on to see 
      $ gen_uniq_id.bsh -d -m0 -r0 -s$(history|tail -5)

        ## To generate a random md5sum without random data collected 
        ## and without any collected mouse movement data and with a 
        ## custom default text 
      $ gen_uniq_id.bsh -r0 -m0 -s="For my benefit only, I .."

        ## To debug and see all internal variables 
      $ gen_uniq_id.bsh -d 2>&1 | more

        ## To change the defaults for the mouse dev to device id 10, 
        ## random device to /dev/really_quick_random and 
        ## statement to "A DECLARATION OF SOVEREIGNTY NEED NO WITNESS"
      $ gen_uniq_id.bsh -M=10 -R="/dev/really_quick_random" \
                        -S="A DECLARATION OF SOVEREIGNTY NEED NO WITNESS"