unisteg.py -- Hiding text in text using unicode

I'm proudly presenting my latest little script: unisteg.py. This is a steganography tool that can hide text within text that is unicode encoded, and has lots of diacritics. I'm exploiting a feature of unicode that allows characters with diacritics to be written either as a monolithic "composed" character that is a single symbol, or in a "decomposed" form in which the component symbols combine. These two different ways to represent the underlying characters are visually indistinguishable. This is where I'm hiding the secret plaintext.
Usage: unisteg.py [options]  >

Prints output to stdout by default.

Options:
  -h, --help            show this help message and exit
  -s, --steg            Hide plaintext in covertext to produce cyphertext.
  --url-plain=URL_PLAIN
                        URL to retrieve plaintext from
  --url-cover=URL_COVER
                        URL to retrieve covertext from
  --file-plain=FILE_PLAIN
                        File to retrieve plaintext from
  --file-cover=FILE_COVER
                        File to retrieve covertext from
  -b, --binary          Use if the plaintext is a string of 1s and 0s
  -e ENCODING, --encoding=ENCODING
                        Encoding of the covertext, if not unicode. See Python
                        codecs module for possible values.
  -u, --unsteg          Derive plaintext from cyphertext.
  --url-steg=URL_STEG   URL to retrieve cyphertext from
  --file-steg=FILE_STEG
                        File to retrieve cyphertext from
  -o OUT, --out=OUT     Filename of output
To test: $ unisteg.py -s --url-cover "http://www.theholyquran.org/sura_print.php?kid=1&sid=2" -e latin5 -o steg.txt "this is a test" $ unisteg.py -u --file-steg steg.txt This software is distributed under a BSD license with the endorsement restriction clause removed.

Exocortex Paper

I have finished my independent study course titled Exploring the Exocortex. I enjoyed it immensely and learned a lot while doing it, only some of which I was able to condense into the paper below. Some thanks:
  • Dan Grover -- for mentioning MontyLingua to me and speeding up the development process many-fold
  • Hugo Lin -- for MontyLingua
  • Steven Bird, Edward Loper, and Ewan Klein -- for NLTK
  • James Allen -- for providing the impetus to choose VerbNet over FrameNet thus saving me many headaches.
  • Timothy Hickey -- for advising the course and allowing such non-standard research to take place
The paper: "Exploring the Exocortex: An Approach to Optimizing Human Productivity" by Michael KatsevmanPDF I will publish the code soon as I finish cleaning it up and packaging.