Website Logo. Upload to /source/logo.png ; disable in /source/_includes/logo.html

Mocking Eye

'Tis all in vain?

Moviesneak.py -- Find Contiguous Movies to Sneak Into!

| Comments


Edit: Fix in the comments has been applied, and the whole app is now on BitBucket, so download it from there: http://bitbucket.org/anateus/moviesneak.py

I'm happy to announce the release of version 0.5 of moviesneak.py, script for figuring out what movies come right after each other such that you can go from one to the next on a single ticket.

This version follows an age-old tradition of BASIC programs by prompting you for information step by step. The date, zipcode, movie theater, and movies you actually want to watch. Then it prints out a list of movie pairs (I doubt you're going to sit through more than two).

Currently a tolerance of 15 minutes is hardcoded, fairly straightforward to modify. What this mean is that if one movie starts 15 minutes before another ends, it would still recommend them as a pair, what with previews and whatnot. If the opposite is true, 15 minutes is not too suspicious a time to loiter about :>

The code is distributed under a BSD license. It requires BeautifulSoup which I've included in my distribution, so you just need Python!

Happy sneaking! Don't get caught :)

 

Download here:

moviesneak-0.5.zip

My Life Is Chaos Turned Miraculous

| Comments


In Washington there isn't any plan
With "feeding David" on page sixty-four;
It must be accidental that the milk man
Leaves a bottle at my door.

It must be accidental that the butcher
Has carcasses arriving at his shop
The very place where, when I need some meat,
I accidentally stop.

My life is chaos turned miraculous;
I speak a word and people understand
Although it must be gibberish since words
Are not produced by governmental plan.

Now law and order, on the other hand
The state provides us for the public good;
That's why there's instant justice on demand
And safety in every neighborhood.

David Friedman in The Machinery of Freedom (free full text now available!)

Feynman on Science and Religion

| Comments

The following quote is from a talk by physicist Richard Feynman called The Relation of Science and Religion, which I consider a apologia of sorts:
I don't know the answer to this central problem – the problem of maintaining the real value of religion, as a source of strength and of courage to most men, while, at the same time, not requiring an absolute faith in the metaphysical aspects.
I feel that's the core of the conflict. Whatever aspects religion has, they are thoroughly dependent on an unquestioning acceptance of some metaphysical model, whether a minimalist one such as in Deism or one as expansive and involved as in Animism. This is the aspect that is antithetical to science, not just in science's conclusions (it's hard to falsify a deistic model, unlikely that will occur any time soon) but fundamentally in its approach.

Google Wave and the Communication Facilitated by Operational Transformation

| Comments

Operational Transformation (OT) is the theoretical secret juice behind Google Wave. It is the technique that allows concurrent editing that resolves conflicts and prevents the need for edit locks and the like. At its core Google Wave is an extension of the XMPP protocol to support OT, with the sought after invites being for Google's official Wave server and its attendant web client. However, multiple clients and servers are not only possible, but desirable and planned. There's already PyGoWave for those interested. On top of this core there are several other things such as APIs for automated agents and the like. Some people were immediately struck by Google Wave and a few of the demos. However, most people were a bit confused about what it was and in what ways it is superior to e-mail, instant messaging and wikis. It was touted by its creators as a replacement to all of these, and I'm not convinced that is the case. The nature of Google Wave encourages the collaborative creation of prose in a dialectic style, once rather popular but I think a bit neglected these days. It allows the interactivity and most importantly tight coupling with other participants that media such as IRC and Instant Messaging allow but without the strong preference for terse expression. Google Wave, I hope, will lead to a resurrection of dialectical prose.

Wondering Where I've Been?

| Comments

I'm currently residing in Mountain View, California. I co-founded a startup called HighlightCam, which has been funded by Y Combinator, a seed-funding company that provides extensive counseling and instruction in addition to money. We launched a couple of weeks ago. Working on a startup is clearly what I was meant to do. Been enjoying it immensely. Will post more side-project type stuff when I have time to work on side-projects :)

MBTAsux - Mining the Zeitgeist

| Comments

My latest personal project is MBTAsux. For those who don't know, the MBTA is Boston's public transportation authority, running subways, busses, commuter rail, and the like. To say the least, many people are unhappy with the way it is operated. So, as a subject near and dear to my heart, I decide to make MBTAsux. What it is: grabbing twitter messages and posting them in a format that allows easy skimming, in addition to extracting some data from the text. The things I'm interested in:
  • Rudimentary sentiment analysis, i.e. how are people feeling about the MBTA right now?
  • Location tracking. I want to figure out where people complain the most. Control that for the "size" of the stations (Park and South Station would probably win the popular vote here).
Things I've yet to implement that I think are essential:
  • Submission form, and a mobile version of it. You know, for people who don't use twitter.
  • Map. Alas, people are not really mentioning their exact stops when they complain. So there is a really small percentage of twitters coming in that would be mappable. This brings me to the next feature:
  • A nano-format for complaining about the MBTA on twitter and other media. Something like: s:kendall someone just played the Marseillaise on the hanging pipes #mbtasux
  • How I wrote it:
    • Python
    • Google App Engine
    • Latest version of the code will be released soon under a BSD license.
    Enjoy!

unisteg.py -- Hiding Text in Text Using Unicode

| Comments

I'm proudly presenting my latest little script: unisteg.py. This is a steganography tool that can hide text within text that is unicode encoded, and has lots of diacritics. I'm exploiting a feature of unicode that allows characters with diacritics to be written either as a monolithic "composed" character that is a single symbol, or in a "decomposed" form in which the component symbols combine. These two different ways to represent the underlying characters are visually indistinguishable. This is where I'm hiding the secret plaintext.
Usage: unisteg.py [options]  >

Prints output to stdout by default.

Options:
  -h, --help            show this help message and exit
  -s, --steg            Hide plaintext in covertext to produce cyphertext.
  --url-plain=URL_PLAIN
                        URL to retrieve plaintext from
  --url-cover=URL_COVER
                        URL to retrieve covertext from
  --file-plain=FILE_PLAIN
                        File to retrieve plaintext from
  --file-cover=FILE_COVER
                        File to retrieve covertext from
  -b, --binary          Use if the plaintext is a string of 1s and 0s
  -e ENCODING, --encoding=ENCODING
                        Encoding of the covertext, if not unicode. See Python
                        codecs module for possible values.
  -u, --unsteg          Derive plaintext from cyphertext.
  --url-steg=URL_STEG   URL to retrieve cyphertext from
  --file-steg=FILE_STEG
                        File to retrieve cyphertext from
  -o OUT, --out=OUT     Filename of output
To test: $ unisteg.py -s --url-cover "http://www.theholyquran.org/sura_print.php?kid=1&sid=2" -e latin5 -o steg.txt "this is a test" $ unisteg.py -u --file-steg steg.txt This software is distributed under a BSD license with the endorsement restriction clause removed.

Exocortex Paper

| Comments

I have finished my independent study course titled Exploring the Exocortex. I enjoyed it immensely and learned a lot while doing it, only some of which I was able to condense into the paper below. Some thanks:
  • Dan Grover -- for mentioning MontyLingua to me and speeding up the development process many-fold
  • Hugo Lin -- for MontyLingua
  • Steven Bird, Edward Loper, and Ewan Klein -- for NLTK
  • James Allen -- for providing the impetus to choose VerbNet over FrameNet thus saving me many headaches.
  • Timothy Hickey -- for advising the course and allowing such non-standard research to take place
The paper: "Exploring the Exocortex: An Approach to Optimizing Human Productivity" by Michael KatsevmanPDF I will publish the code soon as I finish cleaning it up and packaging.

Processing User Goals and Narratives

| Comments

In order to model a strategy to reach a goal, we need to parse some user input. A goal is a particular frame with particular arguments. Each step in a strategy is---in fact---also a goal! Some goals are stubs, certainly. This feature means that the system understands the underlying details better and better. If once "make a salad" is specified as a step in "make a dinner", and later "make a salad" is narrated, next time "make a dinner" is undertaken, the details of salad-making can be taken into account. So, how does one undertake processing a narrative? Each sentence is examined separately. It is an underlying assumption of the system that sentences will be kept simple. So, the goal is one statement, and each sentence in the narrative is a statement. I am adopting the method described in "A Maximum Entropy Approach to FrameNet Tagging" (2008) by Michael Fleischman and Eduard Hovy. According to that model, the MaxEnt classifier (I'll be using an NLTK impementation) will take these features:
  • Phrase type: PP, NP, etc.
  • Voice
  • Position: position in the sentence
  • Grammatical function: external argument, object argument, etc.
  • Head word: the verb in question
And decide what each word in the sentence is what frame element (Agent, Cause, etc.) In addition to those features, an n-gram model may be applied, wherein the each subsequent word processed will be supplied the classification of some of the previous words, since once one word is classified as an Agent, another one is unlikely to also be one. So, a user tells a simple story, and what do we get? We get a frame tagged with the head word. That is, a Motion frame, for example, would also include the particular verb lemma:
    The boy walked to school
  • Theme: "the boy"
  • Direction: "to"
  • Goal: "school"
  • Head: "WALK"
The space of strategies is basically a graph of frames. As each frame gets defined in terms of possible subsequent frames, a Hidden Markov Model of narratives is generated. Then, a wide variety of techniques is available for leveraging HMMs to get us better strategies!

Douglas Engelbart - Augmenting Human Intellect: A Conceptual Framework

| Comments

In my paper reports I focus on materials that are relevant to my goals, rather than a general and exhaustive overview of what the papers discussed. I will concentrate on presenting the pertinent ideas I have gleaned from these sources. I will include asides by myself---i.e. comments on the material---within blockquotes. As one of my initial papers I chose a very important work by one of the luminaries of human-computer interaction Douglas Engelbart---best known for inventing the computer mouse. Augmenting Human Intellect: A Conceptual Framework is a fairly hefty research report describing an approach to augmenting human intellectual capabilities.
  • Engelbart follows the common model of human cognition as a sensory-mental-motor complex. Inputs are provided by the senses, processed via some mental system, and then various motor functions output the results back into the world.
  • Problems are approached by humans by creating solutions that are broken down into many processes and subprocesses. These process collections are called process hierarchies.
    These are what I have chosen to call strategies, and each (sub)process is essentially equivalent to a frame.
  • Different process capabilities of an individual---i.e. the actions the individual may perform---form that individual's repertoire hierarchy.
  • Goals/problems are general things that represent general solutions to such items, e.g. memorandum would represent a sequence of actions involves in writing a memo.
    It seems that the goals, as described by Engelbart, are similar to the concept of prototypes.
  • Engelbart provides a figure represent a fun experiment he conducted. In order to figure out how one may augment a human further, one must understand better how we have been augmenting ourselves up to now. So, this experiment has to do with "de-augmenting" an individual. First, the subject wrote "Augmentation is fundamentally a matter of organization" using a typewriter, taking only a few seconds. Then, the subject produced the statement in cursive, doing it much slower. Then the experiment of "de-augmenting a human by attaching a brick to a pen" proceeded. With a brick attached to the pen, writing in cursive, performance time, as well as quality of product was reduced markedly.
    Although the nature of the product itself had no changed much, the efficiency as well as convenience of the activity was greatly reduced first by elimination of augmenting tools, and then actively reducing the capability of remaining tools. This shows that the statement to be written "Augmentation is fundamentally a matter of organization" is truly a key point. The organization of the writing procedure into typing improves overall productivity greatly.
  • Augmenting capabilities does not hinge on a particular mental theory, since it is only the selection and efficiency of capabilities that is affected. The exact nature and process of the capabilities is of secondary importance.
  • Then, Engelbart refers to Vannevar Bush's seminal 1945 article in the Atlantic Monthly "As We May Think". He quotes extensively from it, describing Bush's Memex system (a major inspiration for the World Wide Web). He goes on on to note that the Memex has but an added benefit of speed and convenience over a traditional filing system.
    That is, no new capabilities were truly added. Only that instead of walking through a hall of filing cabinets, recall is fast. Much like a phone call is a mere spatial surrogate of talking in person.
    One of the reasons that Bush's "predictions" (perhaps self-fulfilling since many inventors and developers were inspired by this article) are so apt is that little technological development remains that is not just an externalization of faculties (i.e. capabilities) that were previously performed less efficiently or maybe wholly internally.
Engelbart lays the foundations of my approach to helping humans achieve goals. I want to derive process hierarchies and repertoire hierarchies by annotating strategy narratives using FrameNet, so that the system may select an optimal process hierarchy for each goal (at each point in time, the optimal strategy may most certainly change based on further input). References:
  • D. C. Engelbart, "Augmenting human intellect: A conceptual framework," Stanford Research Institute, Tech. Rep., October 1962.(HTML | PDF)