James Hollingworth’s Adventures in Code

Icon

Muney: Organizing a students finances

This is the first of a series of articles about my dissertation, a personal financial management application for students. The first task was to organize a users finances by importing their financial information and then correctly naming & categorizing it.

A financial application is useless without a users financial information, however manually adding individual transactions quickly becomes monotonous. Most on-line banks allow you to download your financial information into a variety of financial document formats and so this was the obvious solution to this problem. One of the most popular formats is the Open Financial Exchange Format (OFX), a widely implemented specification used for communication of financial instructions & data between financial institutions & their customers. There were unfortunately no open source OFX parsers for C# so part of my project was to develop one. I have blogged about my development of this component before and after quite a lot of interest, i will be open sourcing the parser at some point soon.

One problem I found, although importing financial information via OFX documents was one hell of a lot quicker than doing it manually, I still had to heavily edit all of the transaction info after the initial import. This was because most companies added lots of extra metadata in the transaction name (e.g. company name, transaction ID, date, etc) as shown in Figure 1. Personally I found that it was quite difficult to, at a glance, understand what each transaction was actually for. I therefore needed a method of inferring the correct name for a transaction.

1271 12DEC07 0000, CO-OP GROUP 310630, ERLEIGH
Figure 1 Example transaction description

Wesabe’s solution to the problem was to record changes users make to transactions and then apply those changes to other transactions encountered with the same transaction name. The problem is, since many transactions contain dynamic metadata (e.g. branch id, transaction date), transaction names are rarely alike.

To combat this, i had to develop a method of splitting the transaction name up into pertinent sub-strings and then mapping the sub strings to the correct names. To achieve, this a modified version of the Lempel-Ziv-Welch (LZW) lossless compression algorithm was used. The algorithm consists of a dictionary of strings, initialised with every alpha-numeric character, and a corresponding code. The algorithm then takes the first two letters of the text to compress and checks if it is in the dictionary. If it does exist, it outputs its corresponding code, and then concatenates the next letter to see if that combination has been seen before. If the dictionary does not contain the combination of characters, it adds it to the dictionary and then concatenates the next two characters within the text. The essence of the algorithm is that it is identifying common sub strings within a string. If the dictionary’s state could then be persisted between imports, the algorithm can be used identify the common sub strings within transaction names e.g. it would identify the common sub string “CO-OP” within “CO-OP 310630, ERLEIGH”, “12 Dec CO-OP” and “43943 CO-OP, READING”.

Although this solution solved the problem of identifying the common sub strings within a transaction name, it created a new problem. Now that the transaction name is composed of > 1 sub strings, it’s not a simple one to many mapping to the correct transaction name but rather a many to many mapping. The problem therefore is, how do you what is the best name? The solution was to use a single layer neural network (Figure 2) with the input layer being the sub strings of the transaction name and then output layer being the correct names. The network is trained any time a user corrects a transaction name and, using then used to compute a best guess for correct transaction name.

Figure 2 Single layer neural network for transaction name inference

One problem with this solution is, if a single neural network is used for all users then the results would not be personalized and thus not very accurate. However if each user has their own neural network, while the results will be accurate, unless the user has previously trained the network on similar transactions names, the solution will not produce relevant results. To solve this, the concept of branching was developed. The premise of branching is to have both a global and user specific networks. The global network is trained every time any user imports a transaction, while the user specific one is only trained on the user’s personal data. If the application encounters an input not found in the user’s network, it uses the connections & outputs from the global network instead. This means the user has the best of both worlds as the solution is personalized if they have stated a preference, yet will still produce results otherwise.

Testing found that the solution can accurately (≈ 90%) identify the correct transaction name & category once the user has used the application for a short period of time. Harnessing the collective intelligence of the users however meant users only encountered a 10% reduction in accuracy for unseen content.

This was only the first (and probably simplest) part of my project, next I shall discuss managing a students finances!

Filed under: dissertation, muney, ofx , , , ,

Muney: Financial Management for Students

Over the past year, I have spent quite a long time on my dissertation. I’m pretty proud of it and I have had quite a few positive comments from various lecturers (I’ve even been asked to write a paper on it for a journal). I’ve decided to write a few articles about my work, hopefully it will hope someone out. If your interested in it, my final report can be found here.

My project was a personal financial management application for students (the actual title was FAST: Financial Analysis for STudents although my final application was called Muney). Essentially I was having problems with my finances a while ago, being the good programmer that I am, I had a look at what software was available. To be honest, from a students perspective, Microsoft Money & Quicken are pretty terrible. Users are required to have a significant amount of financial knowledge to use them effectively. In their defense, these app’s aren’t really aimed at the student demographic.

This was obviously a known problem since I found a few web applications such as wesabe & buxfer, which were developed to solve just this problem. Although these applications are much more student friendly they were really basic, not offering solutions for tasks students are commonly pretty poor at performing (e.g. bill management, budgeting)

So based on this, I decided to develop a financial application which automates important monetary tasks and does so in a way which is easy for a student, with no prior financial knowledge, to understand and use. The application was written using C# & Castle Project’s Monorail framework a screen shot is shown below

The application had a variety of features including

  • Bulk importing financial information via an OFX parser (currently open sourcing the parser I had to write to achieve this)
  • Automatically renaming and categorizng transactions
  • Bill managment (including automatically discovering new bills and recognizing transactions as payments for bills via clustering techniques)
  • Automatic budgeting (including time series forecasting to predict a users expenditure)

Since i covered quite a few topics devloping these features, i’m going to split this blog into a series of articles. The application can be split up into three tasks, organizing, managing and planning a users finances:

Here are the articles discussing these tasks

  • Organization
  • Managment (coming soon…)
  • Planning (coming soon…)

Filed under: .net, c#, castleproject, dissertation, monorail, muney, ofx , , , , , , , ,

C# OFX Importer

I have put the code up on google code

So i didn’t have much to do yesterday so I wrote an OFX importer. I used Jason’s sharpCash as a starting point but ended up doing my own implementation (always easier to rewrite than to understand someone else’s code!). Its pretty compliant to the main bulk of the OFX specification, although it only supports Credit Card and Bank account types at the moment.

It was written in C# and I have compiled it to a class library, the dll for it is here. Basically because this is for my dissertation, I don’t really know where I stand with releasing the code, I will speak to my project supervisor, worst case scenario I will release it early next year but if anyone wants the source, just email me.

To use it, you simply do:

    OFX ofxFile = new OFX(filePath);

Hope this saves someone a couple of days of trouble!

**Update **

Sorry, hadn’t realized  that the link had expired, I have put it in my university web space hosting which you can get here. Will be releasing the code as soon as possible but may be a month or 2 before thats possible. I have one request to make for anyone using, could you just drop me an email with the names of banks you are using, and if feeling very generous any performance metric’s you happen to have collated. I need to show this component works properly on a range of banks but can only personally get hold of OFX documents for a couple of banks. Any help would be greatly appreciated.

Filed under: c#, dissertation, ofx

Importing OFX troubles

For part of my project, i need to do import data in the OFX format. Trouble is that no one has really done much, apart from one library (http://www.nsoftware.com/ibiz/ofx/) but from what i heard, it wasn’t very good. I found this article, http://www.west-wind.com/WebLog/posts/10491.aspx, where some guy seems to be having similar troubles. Had a look around and hes started a project called sharpCash where he has done an implementation of OFX import. Spent the day poking around and its seems to be a good starting point. Currently working on my own implementation, will keep posted on any discoveries i have!

Filed under: c#, dissertation, ofx

my bookmarks