I had a conversation yesterday with someone who said he needed to hire a computer scientist. I replied that actually he needed to hire someone who could program, and that not all computer scientists could program. He disagreed, but I stood by my statement. I’ve known too many people with computer science degrees, even advanced degrees, who were ineffective software developers. Of course I’ve also known people with computer science degrees, especially advanced degrees, that were terrific software developers. The most I’ll say is that programming ability is positively correlated with computer science achievement.
The conversation turned to what it means to say someone can program. My proposed definition was someone who could write large programs that have a high probability of being correct. Joel Spolsky wrote a good book last year called Smart and Gets Things Done about recruiting great programmers. I agree with looking for someone who is “smart and gets things done,” but “writes large correct programs” may be easier to explain. The two ideas overlap a great deal.
People who are not professional programmers often don’t realize how the difficulty of writing software increases with size. Many people who wrote 100-line programs in college imagine that they could write 1,000-line programs if they worked at it 10 times longer. Or even worse, they imagine they could write 10,000-line programs if they worked 100 times longer. It doesn’t work that way. Most people who can write a 100-line program could never finish a 10,000-line program no matter how long they worked on it. They would simply drown in complexity. One of the marks of a professional programmer is knowing how to organize software so that the complexity remains manageable as the size increases. Even among professionals there are large differences in ability. The programmers who can effectively manage 100,000-line projects are in a different league than those who can manage 10,000-line projects.
(When I talk about a program that is so many lines long, I mean a program that needs to be about that long. It’s no achievement to write 1,000 lines of code for a problem that would be reasonable to solve in 10.)
Writing large buggy programs is hard. To say a program is buggy is to imply that it is at least of sufficient quality to approximate what it’s supposed to do much of the time. For example, you wouldn’t say that Notepad is a buggy web browser. A program has got to display web pages at least occasionally to be called a buggy browser.
Writing large correct programs is much harder. It’s even impossible, depending on what you mean by “large” and “correct.” No large program is completely bug-free, but some large programs have a very small probability of failure. The best programmers can think of a dozen ways to solve any problem, and they choose the way they believe has the best chance of being implemented correctly. Or they choose the way that is most likely to make an error obvious if it does occur. They know that software needs to be tested and they design their software to make it easier to test.
If you ask an amateur whether their program is correct, they are likely to be offended. They’ll tell you that of course it’s correct because they were careful when they wrote it. If you ask a professional the same question, they may tell you that their program probably has bugs, but then go on to tell you how they’ve tested it and what logging facilities are in place to help debug errors when they show up later.
I’m having so much fun browsing around your thoughts. I can remember my old days programming in basic and being disappointed when my code was only a few kb and I also remember quite distinctly when my programs started going over basics limits and scrambling for 3rd party tools to extend memory. Thank God for Qbasic.
I’ve been programming for about 10 years now and alot of your logic seemed intrinsic to me. It made alot of my Computer Science classes extremely boring but it was pretty cool to learn different techniques to create more efficient code. I think your analysis.. well your paradigm effectively illustrates the gap that exists between small programs and large ones. I’m just putting the finishing touches on my web toolkit and your words are fresh in my ears.
I’ve written larger programs before but I went all out on this one. I’ve never done so much unit testing before and I’m starting to suspect that I’m paranoid. It’s so crazy how the implementation of some seemingly benign piece of code can complicate a project. The most important thing I’ve learnt however, is that shortcuts are to my detriment. Knowing exactly why certain systems create errors is more rewarding than patching it up.
John,
I’ve read two of your posts regarding programming, this one obviously an the programming productivity vs pay article. Both were enjoyable and in my experience , entirely correct. The good programmers write fewer lines of better code. They also plan better solutions and understand the ramifications of those design decisions better that less capable programmers. I work with both, one is great fun to work with, the less capable is far less fun. We then spend too much rime correcting errors and tracking down bugs.
Exactly to the point. The biggest problem probably lies in the fact that amateurs who’s written 100-liners has absolutely no idea of the design, architectural requirements for such a vast project, and thus fail, as the backend, the design, is not robust, and easy enough to manage.
I wonder how complex a 100k liner would be, biggest i’ve tackled so far is just around 30k.
A real professional adapts to the size of the program. You’d be shocked how many people futz with name spaces in 10,000 line programs and don’t use global variables in 1000 line programs.
Do you have any tips for an aspiring programmer? Are there any specific books or even methodologies you can recommend if one wanted to become a developer who writes large and correct programs with ease?
To clarify, when I said “large programs that have a high probability of being correct” I had in mind large programs that have a high probability of producing correct output. Large programs have almost zero probability of being entirely correct for all inputs, but good programs very often produce correct results.
Johan: I recommend that every aspiring programmer read Code Complete for starters.
ConnGator: This post was picked up by Hacker News today. The extra traffic probably explains your 500 error.
I was part of a small team that created a web app with 137k of Java code and 66k of JavaScript code (not to mention 30k of SQL/stored proc code). Took 3 years and had thousands of bugs, but now it is in production and has been stable for over six months without patches.
It is very much not easy. The only way it worked was with very experienced developers, good management and a dedicated QA team.
BTW when I tried to post this comment I got this:
Error 500 – Internal server error
An internal server error has occured!
Please try again later.
To echo what others have said, I always enjoy what you write.
I agree that there are large differences between the complexity of programs of various sizes. Unfortunately, at least at my college, we never wrote programs over ~1000 lines of code, and thus were not really exposed to the sort of architectural decisions that you’d be forced to make in a larger project.
Fortunately I was on the school’s robotics team (http://robocup.bowdoin.edu/) and was exposed to a much larger code base (tens of thousands of lines of C++, Python, and Java), and was able to immerse myself in a project much closer to “real world” size.
How can universities encourage students to work on much larger projects? Should they offer credit for contributing to open source projects? e.g. extra credit if you can get a patch submitted to project X?
Nick: Thanks. Yes, I think contributing to open source projects would be a great way for students to get some experience. For one thing, it may be the first experience maintaining code they didn’t write from scratch.
Greg Wilson had some sort of program to teach students what they’d need in order to contribute to open source projects. He’s left academia now, but he’s finishing up a book on open source architectures. Maybe his new book would help.
The key is to modularize. I’ve worked on 2 large codebases so far. One was modularized, and I could comfortably live within the boundaries of the module I was working on, only occasionally foraying briefly outside of it. The other was not, and it’s a pain.
Great articles. With every profession, there are always multiples of casual observers who remain on the periphery simply because there’s an esoteric haze covering it. Everywhere you look there’s huge lines of jargon and specialized language. Its refreshing to read about a profession put in relateable terms.
I remember when I was younger and was just learning about programming (turbo pascal), and reflected on the most complex program I wrote being about 150 lines. And then some guys started talking about how the operating system (windows) was made up of 100’s of thousands of lines of code, and then came hyperboles of how much a single programmer (bill gates) contributed to it. I think at that moment I lost interest in programming. I thought there couldn’t have been a place for me if there’s such a skill difference between me and the best in the field.
I always suspected that there were many programmers out there who struggle with mounting complexity. Its interesting to hear an insider comment on the widely differing talents within the programming community.
Hi John,
You’re post totally struck a cord with me! I hadn’t thought about qualifying developers this way, but it completely makes sense. I use to think more along the lines of can the next person make sense of what you wrote, without having to read lots of documentation or talking to you. In other words can a new developer make sense of your code. It’s along the same lines, but your metric of a developer is much more measurable. And it’s much more classifiable. Plus my previous metric didn’t help differentiate developers for larger systems.
I’ve seen many developers who can write small modules, etc. but when the programs get bigger they get overwhelmed and the code becomes a mess. I have definitely noticed the difference between a 1000, 10,000, and 100,000 line program in terms of organization and structure between developers. I just hadn’t realized it until today why.
This is especially true from WTF systems. Lots of copy/paste. Coupling everywhere. No one really knows where to make a change. There’s no architecture or overall framework. Things are all over the place. Basically what worked in a system of 1000 no longer works in a 10,000 system. And for a 100,000 system, well you can just trudge through the mess anymore, it’s too much. You can get away with a lot more BS in a 10,000 system than you can with a 100,000.
Some framework do alleviate this, but only so much. I would say this can make a 1,000 developer move up to a 10,000. But not much more. The jump to a 100,000 line system requires more than just a framework, you have to organize your code within the framework.
Anyways, all that to say thanks for the awesome article. It’s a new perspective for me compared to looking at how readable and maintainable the code was by a new developer to the project. Very cool.
Seeing it late, but great post.
I shared it with some people who I am teaching (Python ) programming. The points in the post apply irrespective of language, of course.
Also agree totally with the recommendation for the book Code Complete.
One of the best for this area.
– Vasudev