meganursula | i'm back to needing a work icon

Entry tags:

programming,
work

i'm back to needing a work icon

Question for object oriented gurus:

I am currently reviewing some code implementing a current standard of an algorithm i frequently use. (I want to examine some modifications to the algorithm, but, i need a good baseline to compare to.) In it we see something like:

struct velocity { int size; double v[D_max] };

There are a lot of these structs - position, quantum, etc.

Thing is, in my code, i generally declare
num_dim = n; // this is what they are using size for up above double position[num_dim]; double velocity[num_dim];

(quantum, for the record, appears to be taking the place of what i usually declare as a constant Eps, and is used to get around numerical issues when looking for zero.)

etc. I do not have additional structs. Thing is, i find all this structifying to be sort of pointless and irritating. Pointless because i do not know what the structs are adding to the code. Irritating because i think they add a level of obfuscation, rendering the code not only longer, but also much less readable.

My question - what, if anything, am i missing in this situation? I get, generally, what object oriented-ness does for you. But i haven't used it very much in the past 6 or so years. (Matlab's excuse for object oriented isn't worth bothering with.) Right now i find myself faced with a few examples of modern code that are object oriented up the ass, and it just seems like it all has been taken too far. If i give myself three months will i become a believer? Will i stop feeling like there should be some sort of natural progression through code and adapt to having objects interacting at will?

Threaded | Top-Level Comments Only

It is entirely possible that you will not become a believer, nor find any reason to use OO stuff here, especially in C/C++, which is what I assume I am seeing above.

Stroustrup, the guy who invented C++, foresaw objects being used primarily for very large chunks of code and data, not so much for little things like points.

You might make a position/velocity array, as a whole, be an object. That depends if there's much anything you would normally do to the whole thing together. Big vectors are often object-ized in other languages to give you easy syntax for things like dot products, cross products, et cetera. I wound up doing a lot with object-ized vectors while writing a lot of numerical integration and differentiation code, for instance. You save passing in two arrays and a size for every function call, instead passing in only an object pointer -- yay for less typing.

Then again, if you don't do much of that then you probably don't care. So yeah, depends on the algorithm, depends if you'll be doing more with the same sort of objects (position/velocity arrays), and so on. But your basic intuition (that a single velocity or position shouldn't be an object) is dead on for the sort of situations it sounds like you're discussing.

A lot of OOness is a bad solution to a very real problem, and occasionally it solves so few problems it's not worth the effort (or sometimes it causes more than it solves).

Things I like to use structures for:
- encapsulation: keep related data together, which makes it easier to pass it around or change the implementation. Sometimes you end up replicating data, like in your example. That rarely matters.

- type-checking. You're declaring a velocity, and that has a bunch of functions related to it that deal with velocities. If your function demands a int and a double*, you might screw up and pass in the wrong int or the wrong double*; if it demands a velocity*, you're not going to make that mistake.

- in C++, you can (and should) declare operator[] on array-like classes, so that you could then use velocity v; v[2] = whatever; and it would check the bounds of your array. You also get access to template libraries like STL and Boost, which have all sorts of extremely useful data structures like arrays that resize automatically, linked lists, balanced binary trees, hash tables, fixed-size arrays that check array bounds, etc.

Oh, and Eps implies you're comparing doubles wrong. Read this article, and love it:

http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm

Then, yank out the AlmostEqual2sComplement function, rename it, and use that in all your comparisons.

Oh crap, I forgot it's poorly written (breaks C aliasing rules, and is only for 32-bit floats). You can use my edit (http://idefix.uchicago.edu/~bhudson/src/float-compare.h), which works. The functions are float_similar(float,float,int) and float_approx_leq, or change float to double. The int parameter is the number of representable values you want to be allow to be fitted in between. If you think you're accurate to within 10 bits, you pass 1<<10 as that parameter.

By "wrong" I mean usually, just saying that two values less than 1e-6 from each other are equal is nonsense. Sometimes, it's sensical, if you know what the units really are. But that's rare in my experience; usually we just make up 1e-6 and move it down to 1e-8 when that causes problems, and maybe that will cause problems too because now two numbers in the billions will test not-equal when they should be equal, etc.

My internal conclusion here has generally been that these are programming practices that are good in general, if over kill in this particular case. But perhaps instituting the practices even in the trivial is desirable because 1) it is your programming habit and you should be consistent, and 2) because it means if you take your trivial program into a non-trivial one you have less refactoring to do.

I think that your answer seems to back this up, no?

I suppose part of the reason i'm curious to hear people's feed back is that i currently have three versions of what should be approximately the same algorithm - mine (in Matlab, about 20 lines worth), the web's (in C, about 21 pages worth, printed out!), and my co-worker's (in Python, and an intermediary length). I find the other two completely unreadable. But when you start complaining about EVERYONE ELSE, it seems likely that the problem is more likely to be YOURSELF.

One thing i find tricky is that there is no point where the code seems to follow a good path, (well, duh, i guess, it isn't procedural) that you can read, and you are instead jumping around to far-flung objects trying to figure out what snippets do. Is there a good way around that, or is it just a price that you pay?

Yeah, this is where it is handy that i mostly write prototypes. Anyway, all i generally use Eps for is to see if a function is reasonably close to a known desirable value (and i know what reasonable is). I honestly haven't figured out what else they use quantum for.

Thanks for the code!

I have nothing to contribute on the technical side, but I think Cthulhu makes a fine work icon.

Sigh.. LJ likes to log me out these days.

I like to use Cthullu for when i'm feeling snarky or crotchety in general. But i think i like him too much to relegate him to work!

Yup, I'd agree with 1) and 2). And as long as you don't have any virtual functions, your C++ class/struct will be as small as the equivalent C struct.

Re: your last paragraph... To some extent, that's the price you pay. It bugs me too. Mostly, it's a different way of thinking. Some of OO's critics refer to it as The Kingdom of Nouns, which is probably a good way of thinking here -- procedural programming, as the name implies, is all about verbs, actions, *doing*. OO is all about objects, nouns, and data.

There's a good programming quote which goes something like, "show me your algorithms and I will check your tables to see how they work. Show me your tables and your algorithms will be unnecessary." OO is a sort of formalization of that idea. It's the idea that data is king, and functions are little adjunts to it. And if you flip that, you can see how procedural programming is the idea that actions are king, and data are little adjuncts to it. In a procedural program, you often have to search in various places to get a coherent picture of the overall data. In the OO mirror-world, you often have to search in various places to get a picture of the control flow, but the data is nicely situated to give you a coherent overall picture.

Functional and Aspect-Oriented programming, probably the next two big contenders after procedural and OO, organize on completely different principles, making it *both* difficult to get an overall picture of the data *and* difficult to get an overall picture of control flow :-)

Took me 6-12 months of being deep in our OO codebase to start believing it was behaving itself while I wasn't looking. I still don't fully trust it, but my plan is to continue on this career path until I do trust it or completely lose faith.

I'm failing to understand your dig at functional programming there. Imperative programming is like letting you scratch things out in your research notes, whereas functional programming doesn't. A big problem in imperative programming is when you accidentally write something at the wrong time. That's impossible in functional programming. You can associate the data and code or have them be separate, whichever you feel like, that's a totally orthogonal issue.

matlab is going to be a lot less verbose in part because it has language support for all sorts of things you want. Solving Ax=b is one line of matlab, and about ten lines of C if everything is already laid out perfectly for clapack. But it never is, so you have to write a ton more cruft. And it's easier to blow your own foot off, so you augment that with all sorts of declarations to typecheck and code to dynamically check that you probably aren't pulling the trigger on that shotgun aimed at your foot. Pretty soon you've got a kSLOC and its attendant expected 100 bugs.

Well, sort of. Except i'm not using a so many high-order functions in matlab.

There is actually support in python (at least with the right package) for stuff like that, AND my co-worker vectorized everyting which is less lines than my own loop system, and is code is still longer.

Reviewing the code, it seems like one reason that the C code is still long is that it is built to be a bit more flexible. And, frankly, having every curly brace on its own line adds pages. I don't think that explains everything.

Say you declare a native array in C like you did, and you screw up and you write to the n+1st element of it. You just lost, badly -- now maybe you clobbered the loop variable and you'll sometimes not stop even though other times seem to work fine. To avoid that, you need to define a vector type, a bunch of functions on that type, and then you are stuck using the type and its functions That takes a lot of verbiage.

Also, I don't think you mean higher-order function the way I usually mean it, but I can't quite figure out what you mean. Solving Ax=b is written "A \ b" (as in, "divide" A by b, but with a backslash because it's not really division). Maybe matrix multiply would have been a better example, but I forget what the matlab operator is for that.

yes, the term 'higher order function' may have been misleading here.

So, the C code doesn't actually do anything to ensure that you don't overwrite an array. So they're not writing extra code to make a vector type. Thats not the reason.

And my point was that, yes, there is a lot of funcionality built into matlab thats not built into C, but, this code doesn't use any of it. The thing that it could be doing is replacing loops with matrix operations, but my matlab code doesn't do that. The python code does, and its still longer.

You're correct that functional programming doesn't make any requirements on how you (lexically) order and arrange code flow versus data, nor even (like OO or procedural) tend to come with strong cultural associations about how to do it. That's pretty much what my dig meant, both about functional and AOP.

Since there are no cultural associations to organize around data or around code-flow (neither basically OO nor basically procedural), most people do neither. Thus, my comment about it being hard to trace both, which is what happens when you don't work to make it easy to do one or the other. It's entirely possible that OCaml, an unholy hybrid of functional (not pure-functional) and OO, manages to have those OO "Kingdom of Nouns" cultural expectations and organize its programs around data like regular OO programs. I've avoided OCaml, mainly because I'm neither a big OO fan nor at all an ML fan, and what few things I liked about ML would be basically destroyed by an OO type system and what it would do to type inference.

Yes, pure-functional (no modifications or side-effects allowed) does solve some problems with ordering, and especially with concurrency. When I said "functional" I meant more "allowing functions as first class objects and encouraging passing such objects, and large data structures, through function calls" rather than "prevents all side-effects." While the two are usually culturally bundled, they're technically orthogonal issues -- you could have a language as limited as C and still prevent side-effects, it's just that nobody's crazy enough to do that because avoiding side-effects is much harder than not, so you generally need the language to work harder to accomodate you while you're working out how to do things without all those intuitive imperative tricks like, "print this" or "assign this value to a variable" that are now verboten.

However, if you think of "functional" as meaning "has functions as first-class objects", it's a whole different set of assumptions and problems. Ruby, Python, SML/NJ, are all languages that allow first-class function objects without any prohibition on side-effects (functional, but not pure-functional). So they don't get the advantages you mention, but they still tend to organize code in wacky ways. When you're using a lot of map/reduce code (or its equivalent in other languages), you wind up wanting to create big sequences and pass them through a set of big sequence operations, which is what map and reduce are. It doesn't much matter how you organize your static chunks of data, and barely matters how you organize your data structures at that point -- the heart and soul of the program is going to be that very small number of nearly-impenetrable code that subtly handles your map/reduce stuff, along with some lambdas which will be defined nearby....

Which is fine, and works well for many people. Hell, my Ruby code looks exactly like that. And I try to organize it as procedural rather than OO, because if I'm going to make it that hard to follow control flow then I might as well organize to make it a *little* easier to follow control flow.

So, port your matlab to C, how much code is that?

In the PL community, functional definitely means no side-effects. You can indeed write purely-functional C, though from recent experience I can safely say it's a gigantic pain. SML and o'caml and common lisp and haskell and so on are "mostly" functional languages since you have to go out of your way to be able to rewrite something. Refusing all side effects isn't really practical.

I've seen code that was just a lambda soup. It helped make me despise Ruby, although it wasn't the language's fault (plenty enough other things were). Most code I run into sees a function that takes a closure as basically a loop, which is perfectly readable.

In the PL community, functional definitely means no side-effects

Assuming PL == "programming language", you overgeneralize. That is one thing functional means. Much like "OO" can mean "using objects for polymorphism", and usually does, but may not. It can also mean "all types descended from a single parent type," but often does not.

Most code I run into sees a function that takes a closure as basically a loop, which is perfectly readable

Yup, that's actually how Ruby does most iteration.

And yeah, lambda soup is a pain in the ass. Ruby is interesting because it has little enough in common with most of its predecessor languages that its fans are really still figuring out how to write it, stylistically. So "Ruby style" is all over the map, and still evolving fairly rapidly for a language of its age.

My code - *maybe* two pages.

"Interesting" is one way to put it :)

My main beef with Ruby was that it didn't seem to offer much over python, except for tricky design issues that perl and python faced and overcame 5-10 years earlier. What it has come up with since I last got burned is a mature Ruby on Rails, which is the shiznit when it comes to writing web sites, or so I'm told. My main beef with both python and ruby was that it does almost no static checking, which means if you misspell a function name on a line of code that is only invoked after 30 minutes of computation, you want to kill something when those 30 minutes are up and you just lost the data.

As for PL, I meant the theoretical programming languages research community. Functional means no side-effects; first-order functions means lambdas; polymorphism usually means parametric polymorphism, not the god-awful OO inheritance stuff; first- and higher-order types are in vogue; any language in common industrial usage is hardly worth dignifying with the term "language." None of that is really in dispute in these halls (among the PL types -- others are very happy about how Ruby "doesn't get in the way" [of writing buggy code]). Once you drink the kool-aid, it's pretty hard to go back. Languages in industrial usage lag the research community by 20-30 years of course, though Simon Peyton-Jones at MSR Cambridge is pushing some pretty modern features into C#.

We're by now pretty far afield of the original topic aren't we...

Yup, pretty far afield.

Most of what I like about Ruby specifically is metaprogramming, which can be thought of as either poor-man's LISP macros or structured-eval-plus-runtime-type-definition. Rails happened in Ruby because Ruby has metaprogramming, which Rails uses quite extensively.

I like Python just fine, but its indents-are-syntactic quirk makes it very hard to embed as a templating language. That's not a huge drawback overall, but again, Rails needs a very robust templating language since a lot of what it does involves generating HTML from templates. And honestly, I just like Ruby syntax better for most stuff.

You write really verbose matlab if you could port it to C in just 6 times as many lines. One project I was on, we told the team in charge of the ROM how many SLOC we had, they multiplied by 18 bytes/SLOC and told us that's how much ROM we could use. We begged for 10x as much because we were using C++, not C as they had assumed, and thus we could express a lot more machine instructions in one line of code. Then, I got tasked with getting the code down to just 180 bytes per line -- eventually we got there.

One thing I should probably have thought of, but wasn't thinking it since I don't usually write this kind of code: if you're basically trolling through dense matrices and vectors, then, yeah, making lots of structures is pretty silly (except for error handling, but your competition isn't doing that, which makes me weep). It matters a lot more when you have complex data structures all linked together by pointers -- which is what I usually deal with. And once I've gone down that bridge, somewhere I probably needed a vector, and so I want your dense linear algebra function to be able to take that vector type instead of forcing me to copy it into an array.

Pretty far afield is great in my book!

You write really verbose matlab if you could port it to C in just 6 times as many lines.

No, i don't think so. I think you're overestimating the complexity of the actual algorithm.

I suppose all these numbers are fuzzy, though, so maybe before i make people think i'm being too precise i'd have to go through the exercise and actually count the code lines (as opposed to comment lines).

Threaded | Top-Level Comments Only

i'm back to needing a work icon

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject