Squawk Boxes and NLP

Posted on Wednesday 31 August 2005

As I was writing A peak at Natural Language Processing I excluded the “spoken language” component of NLP to simplify certain arguments surrounding this sub-field in AI. A couple hours later I remembered a discussion I had with a friend many years ago. He had been looking at new services via the web that gave you a Squawk Box (not CNBC’s Squawk Box but the name invokes the same instantaneous information flow) . A Squawk Box is a super high-tech cutting edge device that delivers (get this) live audio from the trading floor. Okay, all kidding aside, a Squawk Box is basically a connection to someone calling out the bids and the asks from a pit or trading floor - it is that simple. Even real-time quotes are not as up-to-date (by seconds, not much more) as the information flowing from a Squawk Box, as the quotes need to be entered into a system before they are distributed.

Listen to this clip from the Crude Oil pits. Better yet, check out this clip from the S&P pits (darn it! The clip is now missing - I will try and find another one…) when the Fed announces a rate cut. Listen to the whole clip (or imagine you are listening to it…) - it is a bit boring at the beginning, but about half way through it gets very exciting. Are you listening to what the announcer is saying? Or to how he is saying it?

From an NLP or speech recognition perspective it would be an interesting exercise to attempt to capture the quotes as they flow from the Squawk Box. All kinds of challenges to overcome, especially the noisy environment that the speech would need to be extracted from. The end results - a stream of quotes that would effectively (should) mirror your real-time quotes (from the pits - obviously action in the electronic markets would not be reflected). A lot of additional effort for (perhaps) a second or two faster real-time quotes.

Real-time quotes is not the main value of the Squawk Box to a trader that is not on the floor. Listen to the emotion in the announcer’s voice - especially after the Feb rate cut had been announced. Better yet, listen to the background noise of the traders in the pit. Back to the discussion I had with my friend. Take one of these web based Squawk Box feeds and use it to get a handle on near-term direction of the market. So, we are again dismissing the speech (and in this case dismissing NLP also - it had just been the key to reminding me of this conversation) recognition field and going for something else. Pulling information from unstructured data - in this case we don’t have the luxury of text or even speech - I am thinking of extracting voice inflection, speed of speech, stutters, volume, intensity - you name it. How to extract the emotions of either the announcer or of the entire pit.

How useful would this information be? Would it be predictive or is it just descriptive (describing the events taking place)? Not sure, but this concept is something that has been in the back of my mind for a while. I didn’t do any research for this article, so there may already be some investigation into this, so I will take another look at it once I am done going through NLP.

FYI - here are some services that provide Squawk Box functionality over the web:

www.realtimefutures.com
www.los.net
www.xsquawk.com - no longer around…

Administrator @ 12:11 am
Filed under: Artificial Intelligence and Psychology/Sentiment and Trading
A peak at Natural Language Processing

Posted on Tuesday 30 August 2005

NLP - NLU - Computational Linguistics - Text Mining - Automated Understanding - Information Extraction - This field has many names.

First of all - we are not talking about Neuro-Linguistic Programming! Sorry…

The World Wide Web really made obvious the difficulty of the problem of understanding of unstructured data. Information (let us not even think of knowledge at the moment) is very difficult to be pulled out of unstructured textual data. Before the web, pretty much all data that people were working with on computers was in databases and generally the data was structured. Obviously, there were pockets of textual data in electronic format out there, but realistically most textual data was on paper or in WordPerfect (remember WordPerfect or even WordStar?) documents residing on unconnected PCs. So, a good deal of AI research really concentrated on STRUCTURED data - and who can blame them?

Then came the web.

More unstructured textual data than we could have ever imagined all accessible via the web. Now capabilities that would allow computers to extract information from this vast source of unstructured data became really important. Okay, so what is Natural Language Processing (NLP)? I like this definition from http://nlp.shef.ac.uk/: NLP “…is the use of computers to process written and spoken language for some practical, useful, purpose: to translate languages, to get information from the web on text data banks so as to answer questions, to carry on conversations with machines…”

Notice the inclusion of “spoken language” as well as written language in this definition of NLP - that is a whole other discussion. Let’s just assume that we are talking about unstructured text that has been captured electronically. Just capturing human speech and converting it into text (let alone automatically understanding it) is a field by itself, so just to simplify, we make the above assumption.

What types of research or application is going on in NLP?

Textual Search - Gasp! This isn’t NLP, is it? Well, yes and no. In the spectrum of understanding unstructured data, simple key word searching is on the low-tech end of the spectrum. Is Google using NLP? I am sure they would tell you that they are and they are at the edge of the Information Extraction field by virtue of sorting search results by PageRank, but again on the low end of the scale. What Google has on their side is a massive number of documents (over 8.1 Billion), indexed to quickly find keywords, ten thousand machines brute forcing this search and an innovative (at least it was back in 1998) way to rank documents based on key word relevance. Not to beat a dead horse, but the whole idea of AI is to do things smarter, not harder - Google is wonderful, but technology from 1998 is not going to be able to carry it too much longer.

Information Extraction - Pulling relevant and specific information from text. For instance, culling your vast store of emails for very specific information. People - Places - Things - Dates - Times - these are the types of things that would be useful for an office worker. A process that automagically extracts this information from text, such as emails or even instant messages. So, here we have a process that is generally immutable - a predefined type of information - a person - a place - an event - is being extracted from text. This differs from keyword searching because it constrains the user - in text search the user can search for any keyword, however, the results may not be useful for what they user was trying to find out. IE is making the question much more clearly defined: Give me all the NAMES of PEOPLE in this document. Give me all the NAMES of LOCATIONS in this web page. Give me all the PRICES of COMPUTERS in this web site. Specific questions, specific answers.

Just another set of tools to add to your toolbox for the analysis of the markets. How do you do it? How do you do it effectively?

Administrator @ 2:50 am
Filed under: Artificial Intelligence
Meta Genetic Programming to help cure a disease

Posted on Monday 29 August 2005

Michael Gospatrick wants help, specifically to further the research area of meta genetic programming. I found his web site - Ideas for Genetic Programming when I was browsing Technocrati’s Artificial Intelligence tagged articles.

I am all for genetic programming, but I am a little concerned that he is addressing his problem (find a cure for a disease) in the wrong order. Like many techniques, genetic programming seems really cool, but breaks down on general problems. For instance, evolutionary based algorithms - such as genetic programming - don’t do so well on problems that have very flat solution spaces. An example would be password discovery when breaking into a machine. There is no feedback on how close you are to the solution each time you enter a password - there is only one solution. If part of finding a cure to a disease is just the pure brute force of finding a combination of genes or proteins - then it is a flat solution space and genetic programming is not going to make any difference.

What Michael may want to do is write about the disease and try to explore the problems surrounding finding a cure.

Administrator @ 11:31 pm
Filed under: Machine Learning
A glance at Distributed Computing

Posted on Monday 29 August 2005

There are a lot of difficult problems out in the real world which require massive computational power to solve. Problems in genomics, scheduling, drug design, weather and the stock market. IBM refers to the class of tools required to address these complex problems Deep Computing. Basically, Deep Computing is the intelligent application of computational power to these problems. Distributed computing is a way of gathering the raw processing power for these solutions.

A popular example is SETI@home. SETI is the Search for Extra-Terrestrial Intelligence, which is looking at radio telescope data for signs of intelligent life beyond our planet. Radio telescopes produce a bunch of data and crunching through these signals looking for indicators of ET would take any one computer (even the most powerful supercomputers) hundreds of years to process. So, you break the problem into smaller pieces and distribute these pieces to thousands of computers for them to solve and report back their answers to a central server. Instant raw processing power!

So, how are people using distributed computing for the stock market? Well, at least one public project, MoneyBee, is trying to leverage distributed computing to tackle the stock market. They have a screen saver that you can download and it will use your computer when it is idle to process jobs. These jobs are training neural networks to predict the movements of individual stocks and indexes. Okay, so I am not a big fan of neural networks, but we will have the PRO and CON debate about them in the future. The point here is MoneyBee is gathering large amounts of processing power and focusing it on a problem -> predicting individual stock movements. Not many other public projects out there, although Intelligent Broker appeared a couple years ago (and is now gone - this link points to an archive in archive.org), but never really seemed to do anything. It seems the concept was to have an open platform for developers to use the processing power of 100’s of machines to create stock market related trading systems that could then be sold to traders interested in the systems. A cool idea, but it seems to be dead now.

So, think about distributed computing when you happen upon a problem that requires a good amount of processing power to solve. There are plenty of distributed computing platforms that allow you to develop applications to take advantage. Another great resource that lists other public non-market related projects is Internet-base Distributed Computing Projects web page.

Administrator @ 8:46 pm
Filed under: Distributed Computing and Parallel Processing
Semiologic - AI blogger

Posted on Monday 29 August 2005

I have been doing a lot of searching recently for information on WordPress and consistently have run into a useful blog called Semiologic written by Denis de Bernardy. The title is actually Semiologic - Denis de Bernardy on Artificial Intelligence, which should be a good clue about what many of the posting are about. A strong leaning toward Natural Language Processing - for instance a short blurb about Unsupervised learning of natural languages.

More important than writing about AI is implementing AI. And Denis walks the walk. I have not had time to play with it yet, but Denis implemented a terms extraction plugin for WordPress based on Yahoo! Term extraction. He has plenty of other WordPress AI projects he has been working on. Should be one of the plugins I try this week.

NLP - natural language processing - can really have an impact (has had an impact) on the usefulness of the web. More importantly to our discussion of the markets, NLP can be a major advantage in the analysis of unstructured data - text. Since SO much is written about the markets and since most everything else also can have an effect on the markets - being able to analyze text certainly makes the NLP field very interesting.

Administrator @ 3:30 am
Filed under: Artificial Intelligence and Blogs and Links
The Stock Market and Artificial Intelligence

Posted on Sunday 28 August 2005

Exploring advanced techniques and technologies for the analysis of and trading in the stock market.

Heavy.

Okay, so I am using the term AI, as in Artificial Intelligence, to describe the broad set of tools and techniques that help enhance human intelligence. I want to highlight methods drawn from a wide spectrum of research areas that can help in the analysis of the financial (stock, bonds, options, commodities, etc) markets. We are going to get pretty out there sometimes, so be prepared to stretch your mind.

For a little taste of that is out there, take a look at Webmind AI Engine in this article about a dot com era company that built an AI system for Market Prediction. A Wired article details the demise of Intelligenesis, the company created to develop the Webmind AI Engine.

Alas, Intelligenesis is in bankruptcy and Ben Goertzel has moved onto a new venture, the AGI Research Institute. The AGIRI team is now working on another AI Engine and we will keep an eye on their progress.

Check out the many links. It seems that the Market Predictor system was using many of the techniques in the AI/machine learning tool chest: free text-analysis, neural networks, evolutionary algorithms, distributed processing, and pattern recognition. It is difficult to determine how much success the tools really had in the prediction of market movements, but they claimed that initial tests showed good performance in their “non-text-based analysis.” No insight on the free text-analysis (NLP) performance.

Administrator @ 3:11 am
Filed under: Artificial Intelligence and Stock Market
Creative experimentation in stock market analysis

Posted on Saturday 27 August 2005

I am no expert in creativity, but most people that I speak with about the analysis of the stock market seem to have a very narrow sense of the possibilities. Perhaps it is the tools and software that have a heavy concentration on all the “standard” indicators - from the world of technical analysis - that helps to create this analytical blindness. Playing with the multitude of technical indicators that are part of your typical stock market software is a lot of fun - no question! I have spend hours - days - months - years playing with the MetaStocks and the TradeStations of the analytical world, but honestly the best tool I had was Lotus 123. Actually, the very first software I used to play with stock price data was dBase III, but I didn’t get too far with it at the time. Where I really first felt the creative juices flow was when I got Lotus 123 (this was in DOS) working with a bunch of data I downloaded from Dow Jones - downloaded using my sweet 2400 baud modem that would never connect above 1200…

Before I continue to go on and on and on about walking to school uphill and in the snow and without shoes….

The point I want to make is - the first time I played around with stock price data was in a spreadsheet program. I didn’t have CCI, MACD, RSI or any other conventional technical indicator canned and ready for use. I explored and played with the data in the unconstrained (at least to my technical ability at the time - once I learned BASIC better, I grew out of Lotus quickly) environment that a spreadsheet can provide. Not that access to technical tools “constrains” the user, but the ease and ready access to these indicators can slow the creativity process. In fact, I imagine most technical analysis tools these days give the user a great deal of flexibility in designing new indicators of unlimited complexity. How many people take advantage of that capability? I imagine very few people go that route. I should say, take advantage in a creative and innovative way.

So, that being said - I am looking forward to using DeepMarket as a really big whiteboard to brainstorm interesting ideas, creative analysis and innovative tools. To the point of ridiculousness. Some ideas may seem (or will be) silly. That’s okay - this is all about the imaginative and cool new ways to look at the data.

Administrator @ 12:01 am
Filed under: Analysis and Creativity
Prophecy Self-fulfillment

Posted on Friday 26 August 2005

How much of today’s stock market conventional wisdom is actually self-fulfilling prophecies. The study of price action in charts is an activity that permeates all aspects of the markets - stocks - bonds - commodities. Do the “rules” that traders extract from the study of charts evolve from continual positive feedback - prophecy self-fulfillment? Or are chartists discovering natural patterns that exist in prices?

By the way - I am asking a question - not setting you up for an answer.

For instance, a quip like “Fill the Gap” is an expectation that a stock (or anything trading) that gaps up or down will eventually reverse direction and fill the gap. If most traders feel that this market wisdom is valid, then it will most likely affect how they trade a stock that has gapped in one direction or the other. Will this analysis actually affect the market - rather than just be an observation of the market?

For the moment, let’s not worry about quantum theory.

Okay, so the practical question is - can these rules be quantified? Tested? Are they useful? That is what I am interested in - removing the mystery that surrounds conventional wisdom, go beyond normal analysis and create new analysis techniques. Will they be useful? I will leave that to the reader - I will develop the experiments - publish the results - let you determine the usefulness.

Heck, the fun is in the journey - not the destination. If you are more interested in the destination - move on. This is not where you will find the Holy Grail.

Administrator @ 2:52 am
Filed under: Markets and Psychology/Sentiment and Trading
Calculated Bets

Posted on Thursday 25 August 2005

I recently finished an interesting book: Calculated Bets - Computers, Gambling, and Mathematical Modeling to Win by Steven Skiena is a very interesting look at (does the title give you a clue?) managing money in the face of uncertainty. Gambling and the financial markets have a good deal of similarity when trying to decide what to do with your money, so as an exercise of cross-pollination I picked up this book. Dr. Skiena tells the story (and it is a narrative, rather than a text book) of his interest in Jai-Alai, a process for automatically betting on Jai-Alai matches and how to manage your bets. It is a quick read - check it out if you have a free afternoon.

Administrator @ 3:24 am
Filed under: Books and Trading