Browse > Home / Archive by category 'Code'

| Subcribe via RSS

Text File parsing with Flatworm and Substring Hacks

February 23rd, 2010 | No Comments | Posted in Code

I’m going to start posting more code here, finally. I don’t consider myself a rockstar programmer or anybody to be telling others how to program but when I do something I found interesting, hopefully it’ll also be interesting to others. I’m always looking for better ways of doing things so if you have suggestions, I’m all ears.

Recently I was implementing a new project at work where I had to read and write a bunch of fixed-width files for communicating with a new vendor. My first thought was that I would have to use Java’s String.substring() method to pull out the individual fields. This is ugly because you end up with stuff like

final String FIELD1_START_POS = 10;
final String FIELD1_END_POS = 20;

When you have a few hundred fields, this is exceptionally ugly. Luckily for code and my sanity, I asked how best to do this on StackOverflow and was pointed to the great little library Flatworm. Flatworm allows you to create an XML descriptor for the file you need to parse, then it reads it into a plain Java bean for you. It also takes care of parsing the data if needed, casting into the correct type, stripping unwanted characters, etc. Very, very useful indeed.

Aside: Of course you could also use regular expressions to pull out the data but I don’t see where that would give any advantage. You still have to encode what the field looks like, where it starts, something of that nature. It just feels even more brittle to me. There’s also the more complete route of using lexers, parsers, etc. I don’t know enough about that process to see how that would be benefit me in this particular case. Maybe it would, I don’t know for sure. But from what I do know, it seems like overkill and not a big benefit.

Fast forward to now, I have another project where I need to read and write text files. “Ah ha!” I say. “I’ll use Flatworm again.” “Nope,” says the universe. Unfortunately the file I need to read runs into a limitation of Flatworm. The file has lines where the data starts on column 10, but then it’s a name that could be any length. Instead of padding out the line to the end of the file, the line ends after the name. Flatworm has no way of handling this. I considered hacking Flatworm to handle this condition (and I still might do this as I think it’s useful) but I wanted to try something else first. What I ended up with was better than my first example I think but not quite as cool as Flatworm.

Here’s a mockup of the file I’m working with for reference (. is a blank space)

.....12345......................................02/01/2010.....$123.45.....
.....One hundred twenty-three and forty-five cents
..........Matt Grommes
..........98765..........1 Test St..............123.45

Here’s the first version of the parser code I had

check.setCheckDate( new Date(checkLines[0].substring(91, checkLines[0].indexOf("$"))) );
check.setCheckTotal( checkLines[0].substring( checkLines[0].indexOf("$")+1, checkLines[0].length()) );
check.setAmountWords( checkLines[1].substring(10, checkLines[1].length()) );
check.setPayee( checkLines[3].substring( 10, checkLines[3].length()) );

This isn’t optimal because there is a ton of duplicated code, plus there are problems with the same lines that tripped up Flatworm. I ended up making a new function called getLineValue()

    private static String getLineValue(String line, int beginIndex, int endIndex) {
 
        String value = "";
 
        // endIndex 0 is just a shortcut to EndOfLine
        if(endIndex == 0)
            endIndex = line.length();
 
        if(line.length() != 0 && line.length() >= endIndex)
            value = line.substring(beginIndex, endIndex).trim();
 
        return value;
    }

This is of course just a wrapper around String.substring() but it lets me do some extra checks and have extra logic like using 0 for endIndex to indicate “go to end of line”.

Here’s the modified version, using the new function.

check.setCheckDate( new Date(getLineValue(checkLines[0], 91, checkLines[0].indexOf("$"))) );
check.setCheckTotal( getLineValue(checkLines[0], checkLines[0].indexOf("$")+1, 0) );
check.setAmountWords( getLineValue(checkLines[1], 10, 0) );
check.setPayee( getLineValue(checkLines[3], 10, 0) );

This is a lot better to my eye, not as much extra code cluttering things up. It’s a lot clearer what I’m doing since you don’t have to pay attention to a bunch of substring() and length() calls. This is only about 1/5 of the total lines of parsing code so hopefully you can see how much better this looks over the course of the whole method. See the Aside above for thoughts on some other ways of doing this.

This wasn’t a big project and there may be better ways of going about it but I was pretty happy how this ended up. I like seeing less code so when there are ways of cutting extra things out, it’s a win.

Thanks to the couple of redditors that made comments about this post. I’m always looking to get better at this so constructive criticism is welcome.

Frequently Forgotten Fundamental Facts about Software Engineering

December 3rd, 2009 | 2 Comments | Posted in Code

Here are the most frequently forgotten fundamental facts about software engineering. Some are of vital importance—we forget them at considerable risk.

via Frequently Forgotten Fundamental Facts about Software Engineering.

Very interesting list of easily forgotten ideas. I hesitate to outright call them “facts” since there’s so little real research in programming (and even the author says they might be figments of his imagination. One of the most annoying things about the computer field is how much we reinvent things and forget old lessons so lists like this and discussions on the topics are always valuable if they keep things from being forgotten.

Random Football Picks in Groovy

September 25th, 2009 | 1 Comment | Posted in Code

My friend runs an informal football picks contest every week and I thought of a way to participate in a very geeky way, a program that would make my picks for me. I decided it would be funny to write a program that would randomly make my picks for me since everybody else spends an inordinate amount of time thinking over their picks. Plus, I don’t pay attention to football so my picks would have been essentially random in any case.

The contest is simple; pick who you think is going to win in all the games. The person with the most correct picks wins. You also guess how many points the next Monday Night Football game is going to have in total so if there’s a tie, whoever is closest to that number wins.

Here’s the code for my program:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def matchups = [1 : ['Atlanta','Carolina'],          2 : ['Detroit','Minnesota'],
 3 : ['Green Bay','Cincinnati'],      4 : ['Jacksonville','Arizona'],
 5 : ['Kansas City','Oakland'],       6 : ['New York(NYJ)','New England'],
 7 : ['Philadelphia','New Orleans'],  8 : ['Tennessee','Houston'],
 9 : ['Washington','St. Louis'],     10 : ['Buffalo','Tampa Bay'],
 11 : ['San Francisco','Seattle'],   12 : ['Chicago','Pittsburgh'],
 13 : ['Denver','Cleveland'],        14 : ['San Diego','Baltimore'],
 15 : ['Dallas','New York(NYG)'],    16 : ['Miami','Indianapolis']]
 
def rand = new Random(System.currentTimeMillis() + Runtime.runtime.freeMemory())
matchups.each() { key, value -> println("${value[rand.nextInt(2)]}") }
 
def score1 = rand.nextInt(5) * 7;
def score2 = rand.nextInt(5) * 3;
println("${score1} + ${score2} = ${score1 + score2}")

I’m a Groovy newbie, one of the reasons I wanted to use the language for this little thing, so I don’t know if this is the best way of doing this but this turned out to be pretty quick and easy. The hash/array data structure I used for the matchups variable makes it very easy to pick the various winners with the one-liner on 11. I went with a recommendation I found for super-extra randomness on 10 just make sure.

In the end, my random picks weren’t all that accurate. I think I had 5 right out of 16. There’s randomness for you, so unreliable. :) I also learned that when a random number generator makes your picks, you can’t take the blame for making bad picks but you also can’t take credit for the ones you got right.

Feel free to steal this code if you want to make your own picks. You have to put in all the matchups for the week but I tried to format the code so I could make the list in UltraEdit using its column-editing mode and just drop the teams in. If you have a more Groovy-ish way I could have done the matchups I’d love to hear it in the comments.

I do like my profession, I don’t like my job

September 14th, 2009 | No Comments | Posted in Business, Code, Work

To only a fraction of the human race does God give the privilege of earning one’s bread doing what one would have gladly pursued free, for passion. I am very thankful.

The Mythical Man Month, p. 291

via CLOSED-LOOP: The passionate developer: I do like my profession, I don’t like my job.

This is great stuff. I’ve always felt the way Fred Brooks talks about in that quote and this post captures a lot of how I feel about my job as well. Well worth reading.

If architects had to work like software developers

September 8th, 2009 | No Comments | Posted in Business, Code

Dear Mr. Architect:

Please design and build me a house. I am not quite sure of what I need, so you should use your discretion. My house should have somewhere between two and forty-five bedrooms. Just make sure the plans are such that the bedrooms can be easily added or deleted. When you bring the blueprints to me, I will make the final decision of what I want. Also, bring me the cost breakdown for each configuration so that I can arbitrarily pick one.

Monochrome Blog – If architects had to work like software developers.

Painfully true. Very painfully.

I’m trying to decide if sending this to our product owner would be informative or insulting.

My Long Walk

September 3rd, 2009 | No Comments | Posted in Code

Awhile back I was inspired by a post on a local Albuquerque group blog called Duke City Fix about a guy who walked across town, taking pictures along the way. I’ve always liked walking and thought it would be cool to do a similar walk. Every place I’ve worked at I’ve taken my lunch hours and walked around, sometimes taking pictures but mostly just exploring. You find a lot of neat stuff walking since you’re moving slow and you’re close to the ground. Even if you’re in an office park or urban area I’d encourage you to try walking around and seeing what you see. You might be surprised.

For my walk I decided to go on Montgomery since it goes basically from one side of Albuquerque to the other. It also goes across the river and through our Bosque / North Valley area which is by far my favorite walking area in town. I started at Tramway, the east side of the city, and walked all the way to Coors on the west side. It’s 10.9 miles according to Google Maps and with a couple of small detours I made I think I pretty much did exactly 11 miles. This is far longer than I’ve ever walked before but I did it. :)

My Long Walk Route

My Long Walk Route

Google said it was going to take 3 1/2 hours, which is about 20 minutes per mile. I thought this was doable but I didn’t factor in the heat. It was 81 degrees an hour or so after I started but it got up past 91 a few hours in. This meant I needed to rest and refill my water bottle more often than I anticipated (thank you McDonalds for having cold water, air conditioning, and 3 locations along my route!). It ended up taking me 4 1/2 hours with rest breaks.

I did a sort-of live tweeting of the walk, which you can find on my Twitter stream. The tweeting was fun for me, and helpful with the nice encouragements I got from my friends on there. That’s another nice thing about walking, you can do other stuff at the same time. It’ s hard to tweet from a bike. :)

The other part of the walk was taking pictures. I decided against taking my regular camera with me on this first walk since I was already carrying a water bottle, so I took some pictures with my iPhone camera instead. It’s cool to be able to upload the pics to Flickr while walking too. The whole set can be found at Flickr if you’re interested.

I’m very glad I did this walk, even with the heat and pain my poor legs felt later. I’m already thinking of how I would do a similar walk going North/South across town in fact.

For now though, here’s a picture of why the North Valley of Albuquerque is my favorite walking area. Right on the other side of this wall is one of the city’s busiest streets and you’d never know it.

The valley is my favorite walking area

Book Review: Release It!

August 14th, 2009 | No Comments | Posted in Books, Code

Get Release It! on Amazon

A few weeks ago I heard an interview on Software Engineering Radio with Michael Nygard, author of a book I hadn’t heard of called Release It! My wife had been reading Ship It! and I had heard good things about Manage It! so I was happy to hear about this new book. Over the course of the hour or so interview, Mr. Nygard made one heck of a case both for the book and for his way of thinking about writing software.

Mr. Nygard is a operations guy. That is to say his job is to help big companies maintain the software they use. The focus of the book is pointing out ways developers can engineer their software to work better with operations and be more maintainable. It’s an unfortunately seldom seen topic in programming but at least now we have a fairly thorough book to reference on the topic.

The book starts out with a pretty scary tale of a post-mortem the author did on a huge outage at a major airline. It’s a very interesting look at a huge failure that ended up being caused by a pretty small programming error that any of us could make. He also talks here about getting a thread dump of a Java process to find out where it’s having trouble which I had occasion to use in real life right after I finished the book.

The structure of the book is to introduce a topic, then do a section on Patterns and Anti-Patterns around that topic. The first section is Stability. He talks about different types of failure, and defines stability in the first place which ends up being harder than you’d anticipate. Having spent most of my professional career so far writing internal corporate applications, this was the first place where the book veered off from being specifically applicable to my life. Not to say we corporate developers don’t have to worry about customers or uptime but it’s a different set of concerns. Nobody is going to switch to another billing system because the one we work on is down. But still, it’s useful stuff.

The 2nd section of the book is Capacity. Admittedly, I skimmed this section since I’m not working on anything right now that requires accounting for massive amounts of users or fine-tuning my Ajax requests. I will revisit this section for sure when I get onto something more relevant.

The 3rd section is General Design Issues; split into sections on Network, basic Security, Availability, and Administration. Section 4 is Operations. Both of these are very valuable. Just about everything is illustrated with real examples and specific recommendations, which I like to see.

I like reading about Anti-Patterns because I’m always on the lookout for not only ways to do things but ways not to do things. The Patterns are, of course, good things to keep in mind whether you’re developing a website or a corporate integration program. In fact the Patterns in this book are probably the highlight. Things like using Timeouts, Circuit Breakers, and Connection Pooling are timeless and useful all over, hallmarks of really being Patterns and not just quick fixes and bandaids.

Overall if you’re developing any kind of serious software that’s going to have to serve users and be maintained over time, this book should really be on your bookshelf. It’s the rare book that works first as a read-through and then as a reference to be returned to later. Especially if you’re not the one who has to maintain your code, the focus on Operations is a very valuable way of thinking. If you’ve read the book I’d be interested in hearing your thoughts in the comments.

Get Release It! on Amazon

The Coefficient of User Innovation Friction

August 3rd, 2009 | No Comments | Posted in Code, Geekery

In February 2007, Mike Adams, who had recently joined Automattic, the company that makes WordPress, decided on a lark to endow all blogs running on WordPress.com with the ability to use LaTeX, the venerable mathematical typesetting language.

<snip>

Since then, as reported by observer/participant Michael Nielsen (1, 2), Tim Gowers, Terence Tao, and a bunch of their peers have been pioneering a massively collaborative approach to solving hard mathematical problems.

via Jon Udell, who is The Man

This story is cool in at least 2 ways. First, it warms the cockles of my hacker heart to hear that someone decided “on a lark” to add LaTeX to Wordpress. I never used LaTeX for anything only because I’m not a math person and I didn’t make it far enough in school to go beyond plain text. But deciding you’re going to add support for a beloved but extremely niche typesetting language to the blog software you work on is an impressive thing no matter what.

The main reason this story is cool is the collaborative project that emerged due to this niche feature. Sure, mathematicians could have, and I’m sure did, collaborate on sites before this but from what I read in the comments, adding formulas into websites previously was time-consuming at best. A long time ago there was talk about an addition to HTML called MathML to do just this but I’m not sure what happened to that, and in any case LaTeX is an accepted standard people are used to. So having support for this kind of thing is just the perfect reduction in friction that can help something new emerge. Having to learn a new standard or go through a whole process to display formula is enough trouble that most people won’t participate. If people can re-use existing skills in a new place, more people can contribute and do new things.

When Mike Adams added this feature, I’m sure he thought he helping a few mathematicians add formulas to their blogs and that was it. But the important thing was the removal of friction. If you can remove just a little friction from a social tool that a lot of people use, you’re opening it up to allow people to create new things you never thought of. When a new tool like Twitter or Google Wave comes out, I never pay much attention to the uses the creators come up with. What I really watch out for are the things the users come up with. It cost nothing for users to add hashtags to Twitter, but it’s incredibly useful and cool and will probably end up being part of how they make money. Whenever Google Wave comes out, the important things will be the ones people add later. If the friction is low enough.

Debugging By The Numbers

July 25th, 2009 | No Comments | Posted in Code

The other day my team had an all-day meeting to try to debug a very weird, ugly problem with some accounts in the new billing system we’re finishing up implementing. During the process of trying to figure out what the root cause of the problem was, I went through a process I’ve been through a few times and thought I’d share.

The issue was with how some money was distributed to the account. Sometimes people get money back and it has to be used in a specific way. In this case it appeared it wasn’t being spread out the way it should have, and the numbers looked very strange. In the main example case, we were using there was a number on the account’s invoice that didn’t match anything. When you’re debugging, these mysterious numbers can be very useful. While everyone else was looking at other stuff, I took a little while to try to find where that number was coming from. Code (hopefully) doesn’t just invent numbers so it had to come from somewhere and I’ve had good luck in the past figuring out big problems just by figuring out the numbers.

In this case, the important numbers on their invoice were

  • A payment of $562
  • A credit of $629
  • A big refund of $2438
  • A check issued to the account for $1348
  • A credit balance of $562

First, we don’t usually do credit balances at all. It should have been $0. Then, $1348 didn’t immediately jump out as having any relation to the other numbers. Our non-technical project owner’s first inclination was to believe the program was making things up but I usually go on the assumption that this isn’t the case. :)

The first thing I figured out was that the $2438 had been split into 2 chunks of $1219 on 2 invoices. Since I didn’t know that this shouldn’t have happened (score another one for ignorance), I accepted it and figure out that $629 was $1219 – $562. So this was half the refund minus their payment, which is what should happen. Good.

I then saw that $1348 did have a relation to the other numbers, it was $629 * 2. I started going over this out loud for everybody (also an extremely valuable debugging technique) and it all fell into place. What I finally saw was that the $2438 had been split up over 2 invoices. Then both the month’s payments had been taken out of the refund -> $2438 – ($562 * 2) = $1348. The system had accounted for all the money the account would owe us, taken it out and refunded them the rest. It then held on the $562 for next month’s invoice in order to pay it off then. Whew. Only took about 2 hours.

So going over this math with a clear head and no expectation of what the system should have done, I found the underlying problem. Everything should have collapsed onto one invoice and done everything at once. The refund shouldn’t have been split up and both $562 payments should have been made at once, one of them shouldn’t have been held onto til next month. This is a big issue that makes people’s invoices look weird but the important thing for a billing system is that no money is missing. People had originally thought maybe we were over-paying accounts but that isn’t the case thankfully. Now we need to figure out how to fix it going forward but that’s a job for the billing people.

In the end, once again ignorance saves the day. I didn’t know about some of the particular workings of refunds in this case so I wasn’t making assumptions about that. I know I don’t know all the bits and pieces of how the invoices work so I went through the exercise of finding where the mysterious numbers came from and that led me to the answer. If you’re debugging numbers, writing it down and going through them all with a calculator is immensely helpful. Add them up, subtract them from each other, try to find where the differences are. And talk it out. Maybe you’re going down the wrong path or your lack of knowledge about something is something basic you do need to know. It’s debugging, it’s hard. There’s no map. But don’t let those mysterious numbers float out there, they might be the key to the answer.

Scope

July 23rd, 2009 | 1 Comment | Posted in Code

Scope slide 4

Scope at reboot11, Matt Webb, S&W

This is a brilliant presentation about design and how it can affect us. Extremely worth reading.