James Hollingworth’s Adventures in Code

Muney: Organizing a students finances

with one comment

This is the first of a series of articles about my dissertation, a personal financial management application for students. The first task was to organize a users finances by importing their financial information and then correctly naming & categorizing it.

A financial application is useless without a users financial information, however manually adding individual transactions quickly becomes monotonous. Most on-line banks allow you to download your financial information into a variety of financial document formats and so this was the obvious solution to this problem. One of the most popular formats is the Open Financial Exchange Format (OFX), a widely implemented specification used for communication of financial instructions & data between financial institutions & their customers. There were unfortunately no open source OFX parsers for C# so part of my project was to develop one. I have blogged about my development of this component before and after quite a lot of interest, i will be open sourcing the parser at some point soon.

One problem I found, although importing financial information via OFX documents was one hell of a lot quicker than doing it manually, I still had to heavily edit all of the transaction info after the initial import. This was because most companies added lots of extra metadata in the transaction name (e.g. company name, transaction ID, date, etc) as shown in Figure 1. Personally I found that it was quite difficult to, at a glance, understand what each transaction was actually for. I therefore needed a method of inferring the correct name for a transaction.

1271 12DEC07 0000, CO-OP GROUP 310630, ERLEIGH
Figure 1 Example transaction description

Wesabe’s solution to the problem was to record changes users make to transactions and then apply those changes to other transactions encountered with the same transaction name. The problem is, since many transactions contain dynamic metadata (e.g. branch id, transaction date), transaction names are rarely alike.

To combat this, i had to develop a method of splitting the transaction name up into pertinent sub-strings and then mapping the sub strings to the correct names. To achieve, this a modified version of the Lempel-Ziv-Welch (LZW) lossless compression algorithm was used. The algorithm consists of a dictionary of strings, initialised with every alpha-numeric character, and a corresponding code. The algorithm then takes the first two letters of the text to compress and checks if it is in the dictionary. If it does exist, it outputs its corresponding code, and then concatenates the next letter to see if that combination has been seen before. If the dictionary does not contain the combination of characters, it adds it to the dictionary and then concatenates the next two characters within the text. The essence of the algorithm is that it is identifying common sub strings within a string. If the dictionary’s state could then be persisted between imports, the algorithm can be used identify the common sub strings within transaction names e.g. it would identify the common sub string “CO-OP” within “CO-OP 310630, ERLEIGH”, “12 Dec CO-OP” and “43943 CO-OP, READING”.

Although this solution solved the problem of identifying the common sub strings within a transaction name, it created a new problem. Now that the transaction name is composed of > 1 sub strings, it’s not a simple one to many mapping to the correct transaction name but rather a many to many mapping. The problem therefore is, how do you what is the best name? The solution was to use a single layer neural network (Figure 2) with the input layer being the sub strings of the transaction name and then output layer being the correct names. The network is trained any time a user corrects a transaction name and, using then used to compute a best guess for correct transaction name.

Figure 2 Single layer neural network for transaction name inference

One problem with this solution is, if a single neural network is used for all users then the results would not be personalized and thus not very accurate. However if each user has their own neural network, while the results will be accurate, unless the user has previously trained the network on similar transactions names, the solution will not produce relevant results. To solve this, the concept of branching was developed. The premise of branching is to have both a global and user specific networks. The global network is trained every time any user imports a transaction, while the user specific one is only trained on the user’s personal data. If the application encounters an input not found in the user’s network, it uses the connections & outputs from the global network instead. This means the user has the best of both worlds as the solution is personalized if they have stated a preference, yet will still produce results otherwise.

Testing found that the solution can accurately (≈ 90%) identify the correct transaction name & category once the user has used the application for a short period of time. Harnessing the collective intelligence of the users however meant users only encountered a 10% reduction in accuracy for unseen content.

This was only the first (and probably simplest) part of my project, next I shall discuss managing a students finances!

Written by jhollingworth

May 26, 2008 at 11:02 pm

One Response to 'Muney: Organizing a students finances'

Subscribe to comments with RSS or TrackBack to 'Muney: Organizing a students finances'.

  1. [...] Organization [...]

Leave a Reply