decorate-sort-undecorate in Haskell

Wednesday, 24th June, 2009

Real World Haskell
Chapter 3: Defining types, streamlining functions
Section: End of chapter exercises

Exercise 6: Create a function that sorts a list of lists based on the length of each sublist. (You may want to look at the sortBy function from the Data.List module.)

  1. Answering the question
  2. Decorate-sort-undecorate
  3. Type signatures
  4. Further work

Answering the question

The immediate exercise threw up enough problems to make it worthwhile. The functions sort and sortBy are in Data.List. sortBy takes a comparison function (and sort is just sortBy compare). So, here was my direct answer to the exercise:

import Data.List

longer :: [a] -> [a] -> Ordering 
longer a b = compare (length a) (length b)

sortByLength :: [[a]] -> [[a]]
sortByLength a = sortBy longer a

re type signatures: I knew that longer would have to return the same type as compare:

*Main> :type compare
compare :: (Ord a) => a -> a -> Ordering
Decorate-sort-undecorate

That isn’t how I would write this kind of sort in Python (my main language). For a sort with any kind of custom comparison function, I would use the decorate-sort-undecorate (dsu) algorithm. Here’s a verbose implementation:

>>> def dsu(func, xs):
...     dec_xs = [(func(x), x) for x in xs]
...     dec_xs.sort()
...     undec_xs = [x[1] for x in dec_xs]
...     return undec_xs
...
>>> a = [[9], [1, 1, 1, 1, 1], [2, 2, 2]]
>>> dsu(len, a)
[[9], [2, 2, 2], [1, 1, 1, 1, 1]]
>>> dsu(sum, a)
[[1, 1, 1, 1, 1], [2, 2, 2], [9]]

Passing the comparison function to sort() directly will result in it being called every time two elements from the list are compared. With dsu, the comparison function is called once for each item on the list, then sort() can just use less than.

Here’s my first naive and verbose dsu sort in Haskell:

dsu :: (Ord a, Ord b) => (b -> a) -> [b] -> [b]
dsu decFunc a =  undecorate (sort (decorate decFunc a))

decorate :: (t -> t1) -> [t] -> [(t1, t)]
decorate decFunc [] = []
decorate decFunc (x:xs) = ( ((decFunc x), x) : decorate decFunc xs )

undecorate :: [(a,b)] -> [b]
undecorate [] = []
undecorate ( (_, y) : xs) = ( y : undecorate xs )

Trying it out with length and sum:

*Main> dsu length a
[[9],[2,2,2],[1,1,1,1,1]]
*Main> dsu sum a
[[1,1,1,1,1],[2,2,2],[9]]

A comment by gerg on the RWH website gave the following implementation of sort by length:

sortByLength :: (Ord a) => [[a]] -> [[a]]
sortByLength xss = map snd (sort (zip (map length xss) xss))

This is much terser and more apparently Haskellian than my sortByLength, and it already almost a general dsu. With one small change, and a type signature, we have:

dsu :: (Ord a, Ord a1) => (a1 -> a) -> [a1] -> [a1]
dsu decFunc a = map snd (sort (zip (map decFunc a) a))
Type signatures

I must confess that I didn’t work out all those type signatures all by myself. I used Haskell’s type inference to work it out for me. Write the function without a type signature, then ask ghci what type it is:

Prelude> import Data.List
Prelude Data.List> let dsu decFunc a = map snd (sort (zip (map decFunc a) a))
Prelude Data.List> :type dsu
dsu :: (Ord a, Ord a1) => (a1 -> a) -> [a1] -> [a1]

Remember there’s no difference between a, b and t. They are not types of types, just more or less random letters used by different parts of ghci’s type inference mechanism.

Note that functions seem to be of type (a -> b).

Further work

I asked about decorate-sort-undecorate on Haskell-Beginners and received very helpful, thorough and friendly reponses. Some of their implementations of dsu used Haskell syntax I haven’t come across yet. I’d like to look into this new syntax further and write it up, but that can be another story for another day.

About these ads

22 Responses to “decorate-sort-undecorate in Haskell”


  1. You might not know that Python now has a built-in function called sorted(), which effectively replaces the DSU idiom:

    sorted_list = sorted(unsorted_list, key=lambda v: …)

  2. llaisdy Says:

    Hello Graham, thanks for your comment.

    AFAIK the thing about sorted() is not that is replaces dsu, but that it returns a new sorted list (leaving the old one as is). sort() sorts the list in place.

    The thing that replaces (or, really, internalises) dsu is the key argument, which is available for sort() too.

    To my shame, key has been around since Python 2.4, so thanks for bringing it to my attention!

    So, a dsu-style sort by length in Python (since, ahem, Python 2.4), would be sort(key=len):

    >>> a = [[2, 2, 2], [9], [1, 1, 1, 1, 1]]
    >>> a.sort(key=len)
    >>> a
    [[9], [2, 2, 2], [1, 1, 1, 1, 1]]

    See the Python wiki’s Sorting Mini-HOWTO, section Sorting by keys

  3. Jake McArthur Says:

    dsu = fmap (map snd . sort) . (zip <=< map)

  4. llaisdy Says:

    Jake: thanks for your comment.

    “<=<" is another new bit of syntax. I'm going to look at ., $ and &&& (and now <=<) in my next post.

  5. Yair Says:

    sortByLength = sortBy (comparing length)

    Data.Ord comparing :: Ord a => (b -> a) -> b -> b -> Ordering
    Data.List sortBy :: (a -> a -> Ordering) -> [a] -> [a]

  6. Don Stewart Says:

    By the power of monads!

    — | Right-to-left Kleisli composition of monads. ‘(>=>)’, with the arguments flipped
    (<= (b -> m c) -> (a -> m b) -> (a -> m c)
    (<==>)

    — | Left-to-right Kleisli composition of monads.
    (>=>) :: Monad m => (a -> m b) -> (b -> m c) -> (a -> m c)
    f >=> g = \x -> f x >>= g

  7. llaisdy Says:

    Dear All

    Thanks for your comments (and nice to have a co-author of RWH visit!).

    I should have written about comparing. I’ll make sure to do so in my next post. I might have to draw the line before I get to monads, just so I can get back to the book, and come to monads in their proper place.

    Ivan

    • Applicative Says:

      Maybe it’s reasonable, but I wonder if it’s a good idea to speak of signs like “(.)”, “($)”, “(>>=)”, “(>=>)” and so on as ‘syntax’. They are just names of functions like “sin” or “(+)”.
      I guess the point is obvious, and often enough made — but maybe not often enough? For example, if you don’t like any of these seeming syntax-bits you can replace them with a definion:
      f `composedWith` g = f . g
      or
      f `after` g = f . g
      You can’t get tired of the ‘do’ in do notation and declare for ‘pleaseDo’, or replace the ‘=’of definitions with something else. There’s no thing or function to rename, they really are syntax. You can get rid of Haskell’s dots with a definition, but you can’t get rid of one of Matz’s dots, because his dots don’t mean anything.

      Maybe it’s irritating to repeat it, but the thing itself, composition, is defined so:
      f . g = \x -> f (g x)
      or in some equivalent way, as one also defines
      f $ x = f x

      It is the same with Kleisli composition, in whichever direction, as dons noted,it’s a defined function: f >=> g = \x -> f x >>= g. If you preferred you could write, say: f `fishrightward` g = \x -> f x >>= g. Of course whatever you call the thing, you’ll define it in terms of ‘bind’ (>>=). This is something that you yourself have to define explicitly when you declare a functor List or Maybe or MyAmazingTypeClass to be a monad. But wherever you use it, it’s just another NAME for a definite function, no different from “sin” or “(+)”. Thus in the so-called Maybe monad the function in question is named in the instance declaration:
      instance Monad Maybe where
      Nothing >>= Just f = Nothing
      Just x >>= Nothing = Nothing
      Just x >>= Just f = Just f x
      or something like that. Those lines isolate a mapping between any pair of Maybe values and another Maybe value, and then put a name to it. This is not syntax.
      It’s like “show” or “(==)” which have to be defined when you declare a type to be in the Show or Eq type class. “show” certainly doesn’t seem like syntax!

      When you get rid of a few sugary things like “do” notation and the fun and games with infix functions, and the inevitable things like the form of import statements and data declarations and definitions (with “=”) and grouping with parentheses, then the only real syntactical sign in Haskell is the white space between terms, which signifies the application of the function named by the first term to the function or object named by the second. So when we leave aside the decorative bits of sugar the founders gave us (which are frequently brilliant!) we see that in a way Haskell doesn’t have any syntax, or that it’s invisible — it’s white space and the “Western” decision to treat the name on the left as naming the function to be applied to the thing name to its right … plus grouping by parentheses, which is force on us by the linear form of type. Basically, when you type something or other, it’s not because of some witty rule Matz cooked up, dot this dot that curly bracket this square bracket that. What you type is the name of the thing you’re talking about. This is part of the reason why Haskell is so beautiful.

      People sometimes say that as an exercise, when you come to the monad business, you should avoid the ‘do’ sugar till you figure out what’s going on with (>>=), (>=>) and company. I wonder if it wouldn’t also be a good exercize to eliminate all non alphabetical signs, replacing “.” with “composedWith”, “>>=” with “boundTo” and so on. …It would be a little exhausting! Maybe I should try it.

  8. David Says:

    dsu f = sortBy (compare `on` f)

    No monads necessary :)

  9. Jake McArthur Says:

    The thing about the sortBy (comparing length) version is that it still recomputes the length for each comparison.

    Although…

    {-# RULES
    “sortBy/comparing” forall f xs . sortBy (compare f) xs = map snd . sort . zip (map f xs) $ xs
    #-}

  10. David Says:

    Shoot, Yair’s use of comparing is nicer than my use of on:

    dsu f = sortBy (comparing f)

  11. llaisdy Says:

    @ Applicative I agree with everything you say. For “syntax” I really meant “thingy”. I’ll respond to your comment in more detail when I write on these thingies.

    @ The rest of you It looks like I’ll be writing an intermediate piece on sortBy (comparing ... and sortBy (compare `on` ... .


  12. >> Applicative I agree with everything you say. For “syntax” I really meant “thingy”.

    Beautifully said. :-)

    >> AFAIK the thing about sorted() is not that is replaces dsu, but that it returns a new sorted list (leaving the old one as is). sort() sorts the list in place.

    That’s true, but Haskell’s ‘sort’ and ‘sortBy’ are closer to Python’s sorted(), and Haskell has no analogue of Python sort() since Haskell lists are immutable.

    I’m glad to have turned you on to ‘key’! I also used DSU in my Python code for years before getting into the habit of using a key-function. Idioms are hard habits to break. :-)

  13. llaisdy Says:

    I’ve clearly got too complacent with my Python. With luck this Haskell study I’m doing (and the help I’m getting) will spruce up my approach to programming in general.

  14. Doug McIlroy Says:

    The problem specification does not impose a requirement that the list elements be comparable, yet dsu does. Hence dsu can’t compare everything that sortByLength can. And when it can, dsu needlessly compares lists whenever they have the same length. Zip, which maps elements to (,) pairs, should be replaced by zipWith f, where f produces pairs that are compared only on first elements.

  15. llaisdy Says:

    Dear Doug

    Thanks for your comment. I’ll look into zipWith.

    The rest of your comment I’m afriad I don’t understand. Surely list elements have to be comparable for sortByLength to compare their lengths?

    > Hence dsu can’t compare everything that sortByLength can.

    Could you give an example?

    • lumi Says:

      If you look at the type for ‘dsu length’ and ‘sortByLength':

      dsu length :: (Ord a) => [[a]] -> [[a]]
      sortByLength :: [[a]] -> [[a]]

      sortByLength is completely polymorphic on ‘a’, so you can compare any type of list of lists.
      The dsu version further requires that the elements be comparable (Ord a). For example, you couldn’t use it to sort lists of functions by length.

  16. llaisdy Says:

    Doug, thanks for pointing this out; lumi, thanks for explaining.

    How interesting! So,

    dsuSortByLength a = map snd (sort (zip (map length a) a))

    has to have the Ord context:

    dsuSortByLength :: (Ord a) => [[a]] -> [[a]]

    But the earlier version:

    sortByLength a = sortBy (comparing length) a

    has no such requirement:

    sortByLength :: [[a]] -> [[a]]

    Very interesting. I’ll have to look into why that is.

  17. llaisdy Says:

    Doug and lumi:

    I’ve corrected dsuSortByLength:

    dsuSortByLength :: [[a]] -> [[a]]
    dsuSortByLength xs = map snd (sortBy (comparing fst) (zip (map length xs) xs))

    Discussion here.

    Thanks again

    Ivan

  18. Sebastian Says:

    I want to sort this list
    [(Posicion,[Int])]
    where type Posicion = (Int,Int)
    by the lenght of [Int]
    How can I do this??
    Thanks : )

  19. llaisdy Says:

    Sebastian, Thanks for your comment.

    Have you tried the techniques above (or in some of my later posts on this subject)? How did it go?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: