James Hollingworth’s Adventures in Code

Exams Results

Posted in Uncategorized by jhollingworth on June 11th, 2008

Just found out I got a 1st for my degree as well as the Sulivan Award for best project for the year!

Muney: Organizing a students finances

Posted in dissertation, muney, ofx by jhollingworth on May 26th, 2008

This is the first of a series of articles about my dissertation, a personal financial management application for students. The first task was to organize a users finances by importing their financial information and then correctly naming & categorizing it.

A financial application is useless without a users financial information, however manually adding individual transactions quickly becomes monotonous. Most on-line banks allow you to download your financial information into a variety of financial document formats and so this was the obvious solution to this problem. One of the most popular formats is the Open Financial Exchange Format (OFX), a widely implemented specification used for communication of financial instructions & data between financial institutions & their customers. There were unfortunately no open source OFX parsers for C# so part of my project was to develop one. I have blogged about my development of this component before and after quite a lot of interest, i will be open sourcing the parser at some point soon.

One problem I found, although importing financial information via OFX documents was one hell of a lot quicker than doing it manually, I still had to heavily edit all of the transaction info after the initial import. This was because most companies added lots of extra metadata in the transaction name (e.g. company name, transaction ID, date, etc) as shown in Figure 1. Personally I found that it was quite difficult to, at a glance, understand what each transaction was actually for. I therefore needed a method of inferring the correct name for a transaction.

1271 12DEC07 0000, CO-OP GROUP 310630, ERLEIGH
Figure 1 Example transaction description

Wesabe’s solution to the problem was to record changes users make to transactions and then apply those changes to other transactions encountered with the same transaction name. The problem is, since many transactions contain dynamic metadata (e.g. branch id, transaction date), transaction names are rarely alike.

To combat this, i had to develop a method of splitting the transaction name up into pertinent sub-strings and then mapping the sub strings to the correct names. To achieve, this a modified version of the Lempel-Ziv-Welch (LZW) lossless compression algorithm was used. The algorithm consists of a dictionary of strings, initialised with every alpha-numeric character, and a corresponding code. The algorithm then takes the first two letters of the text to compress and checks if it is in the dictionary. If it does exist, it outputs its corresponding code, and then concatenates the next letter to see if that combination has been seen before. If the dictionary does not contain the combination of characters, it adds it to the dictionary and then concatenates the next two characters within the text. The essence of the algorithm is that it is identifying common sub strings within a string. If the dictionary’s state could then be persisted between imports, the algorithm can be used identify the common sub strings within transaction names e.g. it would identify the common sub string “CO-OP” within “CO-OP 310630, ERLEIGH”, “12 Dec CO-OP” and “43943 CO-OP, READING”.

Although this solution solved the problem of identifying the common sub strings within a transaction name, it created a new problem. Now that the transaction name is composed of > 1 sub strings, it’s not a simple one to many mapping to the correct transaction name but rather a many to many mapping. The problem therefore is, how do you what is the best name? The solution was to use a single layer neural network (Figure 2) with the input layer being the sub strings of the transaction name and then output layer being the correct names. The network is trained any time a user corrects a transaction name and, using then used to compute a best guess for correct transaction name.

Figure 2 Single layer neural network for transaction name inference

One problem with this solution is, if a single neural network is used for all users then the results would not be personalized and thus not very accurate. However if each user has their own neural network, while the results will be accurate, unless the user has previously trained the network on similar transactions names, the solution will not produce relevant results. To solve this, the concept of branching was developed. The premise of branching is to have both a global and user specific networks. The global network is trained every time any user imports a transaction, while the user specific one is only trained on the user’s personal data. If the application encounters an input not found in the user’s network, it uses the connections & outputs from the global network instead. This means the user has the best of both worlds as the solution is personalized if they have stated a preference, yet will still produce results otherwise.

Testing found that the solution can accurately (≈ 90%) identify the correct transaction name & category once the user has used the application for a short period of time. Harnessing the collective intelligence of the users however meant users only encountered a 10% reduction in accuracy for unseen content.

This was only the first (and probably simplest) part of my project, next I shall discuss managing a students finances!

Muney: Financial Management for Students

Posted in .net, c#, castleproject, dissertation, monorail, muney, ofx by jhollingworth on May 26th, 2008

Over the past year, I have spent quite a long time on my dissertation. I’m pretty proud of it and I have had quite a few positive comments from various lecturers (I’ve even been asked to write a paper on it for a journal). I’ve decided to write a few articles about my work, hopefully it will hope someone out. If your interested in it, my final report can be found here.

My project was a personal financial management application for students (the actual title was FAST: Financial Analysis for STudents although my final application was called Muney). Essentially I was having problems with my finances a while ago, being the good programmer that I am, I had a look at what software was available. To be honest, from a students perspective, Microsoft Money & Quicken are pretty terrible. Users are required to have a significant amount of financial knowledge to use them effectively. In their defense, these app’s aren’t really aimed at the student demographic.

This was obviously a known problem since I found a few web applications such as wesabe & buxfer, which were developed to solve just this problem. Although these applications are much more student friendly they were really basic, not offering solutions for tasks students are commonly pretty poor at performing (e.g. bill management, budgeting)

So based on this, I decided to develop a financial application which automates important monetary tasks and does so in a way which is easy for a student, with no prior financial knowledge, to understand and use. The application was written using C# & Castle Project’s Monorail framework a screen shot is shown below

The application had a variety of features including

  • Bulk importing financial information via an OFX parser (currently open sourcing the parser I had to write to achieve this)
  • Automatically renaming and categorizng transactions
  • Bill managment (including automatically discovering new bills and recognizing transactions as payments for bills via clustering techniques)
  • Automatic budgeting (including time series forecasting to predict a users expenditure)

Since i covered quite a few topics devloping these features, i’m going to split this blog into a series of articles. The application can be split up into three tasks, organizing, managing and planning a users finances:

Here are the articles discussing these tasks

  • Organization
  • Managment (coming soon…)
  • Planning (coming soon…)

Subsonic like NHibernate Query Generator button in Visual Studio

Posted in NHibernate, c#, castleproject, visual studio, visual studio 2008 by jhollingworth on March 28th, 2008

One thing I love about SubSonic, in fact the reason i switched, was how awsome the querying is. I don’t know why but something like new UserCollection().Where(”UserID”, id).Where(”Password”, password) just does it for me. Going from that back to either HQL or ICriterion was a rather painful experience. A very clever guy called Oren, has come up with a solution to this called NHibernate Query Generator (NQG). You can do some rather nifty things like FindOne(WHERE.User.Name == “James” && WHERE.User.Password == “NotGoingToTellYou”). Many (including Oren) belive that you shouldnt have strings in your code and thus the code to create all these cool queries is autogenerated.

The problem I found was I had to constantly change back & forth between the command line & VS to update the code. To solve this i borrowed a trick from SubSonic. I have created a little button in VS (see below) which will run the query complier for you.

Nhibernate Button

To do this, firstly download the latest version of NQG, then copy the exe file NHQG.exe to an safe location e.g. “c:\Program Files\NHibernate\”. Next you need to open up visual studio, and click Tools -> External Tools and then click add. Give whatever name you want e.g. NHibernate, and then for the command the location of NHQG.exe, in my case “c:\Program Files\NHibernate\NHQG.exe”. In the arguments field, you have a few choices:

  • /Lang: language used, either cs | vb
  • /InputFilePattern: location of either Nhibernate mapping file or the dll (if your using Castle Projects ActiveRecord)
  • /OutputDirectory: Directory you want the files to go to e.g. ./Models/Queries
  • /BaseNamespace: Base namspace for queries, e.g. Priority.Queries

so a complete example would look like “/Lang:cs /InputFilePattern:bin/Priority.dll /OutputDirectory:Models/Queries /BaseNamespace:Priority.Queries”

For Inital directory you just add $(ProjectDir) which sets the starting location to be the root of the project directory. Finally, make sure that it is the first in the menu (it will become clear why in a second).

externaltools2.png

Once you click ok, you will now be able to run the application by clicking on Tools -> NHibernate. To create a button for this, you need to click Tools -> Customize -> Toolbars & then click new, a new menu should appear. Next you need to switch back to Commands, Select Tools and the scroll down until you see all the External Command x’s. Drag External Command 1 onto your new toolbar and then click close. It should change the text from External Command 1 to NHibernate (or whatever name you gave it) and your done!

Hope this helps someone!

Right now that we have

Monorail: Get client’s browser

Posted in .net, castleproject, monorail, nvelocity by jhollingworth on March 2nd, 2008

For future reference, to get the clients browser:

    $Context.UnderlyingContext.Request.Browser.Browser

This is infact accessing the HttpRequest.Browser property so you can then get all the usual info like browser version:

    $Context.UnderlyingContext.Request.Browser.MajorVersion

Hope this helps someone!

Visual Studio syntax highlighting for Monorail views

Posted in .net, castleproject, monorail, nvelocity, visual studio, visual studio 2008 by jhollingworth on February 24th, 2008

I was just having a look through the options in visual studio 2008 and noticed a new section for selecting the default editor for non-standard files, such as monorail views (.vm). Up to now i’ve either had to make do with the basic text editor or mess around with the registry, neither of which is that much fun. The dialog is in options -> text editor -> file extension you can then choose what editor you want to associate with each file extension. Hope this is helpful to someone!

Adding monorail views to vs 2008

Get the currency symbol for the ISO 4217 Currency Code (C#)

Posted in .net, c#, linq, ofx by jhollingworth on December 9th, 2007

Had a bit of a problem today. After importing a bunch of transactions I needed to display the corresponding currency symbol. The problem was when you import OFX transactions, the currency is in the ISO 4217 three letter format. Basically the little bit of LINQ below will search through all the culture infos until it finds one with the correct ISO Currency symbol. Its little things like this that make me love linq!

RegionInfo regionInfo = (from c in CultureInfo.GetCultures(CultureTypes.InstalledWin32Cultures)
let r = new RegionInfo(c.LCID)
where r.ISOCurrencySymbol == “GBP”
select r).First();

Hope this is helpful to someone!

Web Hosting Workaround for monorail

Posted in castleproject, monorail by jhollingworth on November 30th, 2007

So I recently needed to put an web app I wrote with monorail online. Ended up buying hosting through re-invent who have so far have been very good. The one big problem I soon realized was they (like pretty much every web hosting company) don’t support Castle Project and in particular Monorail. Basically whenever I tried to access a page it would just come up with a 404. So after many hours of hair pulling, I was about to give up and have a very painful rewrite in asp.net. Luckily I had another search on the forums and found out that most web hosting companies will block any non standard file names, i.e. .castle. So using the url rerouting in monorail, I just added the following rule


<routing>
<rule>
<pattern>(.+)(.aspx)</pattern>
<replace><![CDATA[ $1.castle ]]></replace>
</rule>
</routing>

After I added that, I found that everything worked perfectly. There are some problems with it, namely that in methods like RedirectToAction() it will add the castle extension. As a current workaround I have just add the following to the base controller:


protected new void RedirectToAction(string action)
{
Redirect(action + “.aspx”);
}

I know its not the most elegant solution, but its better than nothing. Hope this helps some people!

NVelocity: Get Current Position In ForEach Loop

Posted in c#, monorail, nvelocity by jhollingworth on November 16th, 2007

Just a reminder for myself, To the get the current iteration number in a foreach loop $velocityCount

Problems with the Linq2Sql

Posted in .net, c#, linq by jhollingworth on November 13th, 2007

I have just been reading Scott Guthrie’s post on the new ASP.Net MVC framework. Firstly, for the most part I am really amazed with what Microsoft have come up with and I am really looking forward to having a play with it. I am worried though about the methods Scott uses for creating models for data access. In the examples he creates a Linq2Sql model for data access and then use the the database context class to encapsulate all the business logic. Anyone see any problems with this? Say you have a User table in the database, Linq2Sql will create a user object which you can perform all the basic CRUD operations. Say you want to add a method to check if a given user has valid username & password (basic business logic), which of these would you think is logical?

DataBaseContext db = new DataBaseContext();
db.IsUserValid(username, password);

or

User.IsValid(username, password);

IMHO the second is far better since firstly when writing code you don’t need to be aware of other objects (i.e. database context) to be access the business logic of that object. Also having all the business logic encapsulated in the actual object enforces code separation so it is easier to maintain since there will only ever be one file which contains all the code relating to the object. Also

So what should you do instead of encapsulating the business logic in the context? Well there are 2 methods which would allow you to encapsulate all your business logic, the first being using partial classes. Using the User example again, say Linq2Sql has created a User object, you would add another file in the Model directory and create a partial class which you can then include all business logic, as shown below:


partial class User
{

public static bool IsValidLogin(string userName, string password)
{

//Some business logic
return true;

}

}

This method will allow separation of code and you don’t need any knowledge of extra classes to use it. Unfortunately this method is soooo .Net 2.0, so for everyone who is needing an excuse to use the new extension methods, here it is:


public static class UserBusinessLogic
{

public static bool IsValidLogin(this User user, string userName, string password)
{

//Some business logic

return true;

}

}

Both methods work fine and will enforce code separation. Its a shame that decided on having a single .dbml file for the models and Context file for the business logic. I would have thought having an individual class for each table in the db (with a partial class containing all auto generated code) would be more logical and scalable.

There are fundamental flaws with this like how do you apply rules to properties (e.g. ensuring that a password is > 7 characters)? Currently this sort of business logic is enforced at the database end which is a bit crap since I thought the whole point of Linq2Sql or any ORM was so the developer doesn’t have to deal with the database!

While I am critical of Linq2Sql, it is still early days and I’m sure as time passes they will rectify the problems highlighted here. Until then, I hope the methods shown here will help you write maintainable code!

kick it on DotNetKicks.com

Tagged with: