MUSIC STATISTICS
A Pascal program, Travesty, has the ability to fabricate pseudo-text quickly from any input text (Kenner and O’ Rourke, 1984). To clarify what Travesty does, a discussion of music statistics and what they imply follow.
In the process of composing a piece of classical Western tonal music in G major tonality, the note most frequently used will most likely be the pitch “G”. Either “F#” or “D” will rank second. The composer did not make those decisions, the musical grammar did. Musical grammar appears to make many of the composing decisions within the classical Western tonal idiom. Not only do the pitches observe preferred frequencies, they keep preferred company. A familiar example in G major tonality is shown with the pitch “F#”; the next pitch following “F#” is most frequently the pitch “G”.
If probability compels the choice of the note after a single pitch, what follows an interval is even more restricted. The probability is high that the pitch following “EF#” will be “G”. Pairs of notes like “EF#” have frequencies, like pairs of letters (e.g., “TH”). The most common English pair of letters is “HE”; one can find it three times in the sentence you are reading now and nine times in the paragraph. In addition, the probabilities that govern the next character grow more rigorous moving up from single-letter to three-letter patterns. The question arises, therefore, has the author or composer any choice at all?
Significant statistics derive from personal idiosyncrasies of authors (e.g., James or Joyce) or composers (e.g., Beethoven or Bach). As an example, each of these composers have their own approach of using three-note patterns, four-note patterns, and five-note patterns that is somehow beneath the level of their consciousness.
This line of reasoning brings us to the unexpected claim that essentially random nonsense can preserve many personal characteristic of a literary text or musical text source. Travesty, a program suitable for small systems, has the ability to scan a sample text and generate, from the sample’s n-gram statistics, a “non-sense” imitation through which the original text, and even its authorship, is recognizable. The connection of the output to the source can be stated exactly: for an order-n scan, every n-character sequence in the output occurs somewhere in the input, and at about the same frequency.
Reference
Kenner, Hugh and Joseph O'Rourke.
“A Travesty Generator for Microes,”
BYTE, 11 (1984): 449-469.