Month / April, 2015

Make the Framework

Previously I discussed the background for why a developer would create a framework, to share their solution with the world, and some challenges. Now, I just want to show how to setup a complete framework project in Xcode 6.

  1. Create an Xcode workspace. This is where you will place your framework project and a demo project to test the framework.
    workspace
  2. Create a new project in the workspace, select a “Cocoa Touch Framework” project type. This project is the actual project where you will develop the framework, so name it whatever you want your framework named. The name isn’t strictly necessary to be what you want, but it makes life easier. For this demo, I’ll refer to it as “Framework.project”
    framework_project
  3. Create another new project in the workspace. This project is the test project where you will exercise the framework. This I will refer to as “Exercise.project”.
    exercise_project
  4. In “Framework.project” click the project file, and add a new target. This target needs to be an “Aggregate” type, which is in the “iOS/Other” options. Name this Aggregate target something like “Framework”.
    framework_aggregate
  5. In “Framework.project”, perform whatever development you want to start the framework.
  6. In “Framework.project” click on the project file and navigate to the “Build Phases” section. Expand the “Headers” build phase, and ensure that any headers you want exposed to a user are in the “Public” section. These will be copied into the final framework folder structure and visible to users.
    framework_headers
  7. In “Framework.project” click on the project file and select the “Framework” target, navigate to the “Build Phases” section. Click the “+” at the top left of the phases view and add a new “Run Script” phase. Copy the following in to the script section. This script will force the “Framework” target to build the “Framework.project”’s Cocoa Touch Framework target twice, once for the simulator and once for the device. Then it will combine the slices into one fat binary. Please note, that the workspace name needs to be changed to whatever your workspace is named. Also, the basis for the script can be found on StackOverflow.
    framework_aggregate_runscript
#!/bin/sh

UNIVERSAL_OUTPUTFOLDER=${BUILD_DIR}/${CONFIGURATION}-universal

# make sure the output directory exists
mkdir -p "${UNIVERSAL_OUTPUTFOLDER}"

# Step 1. Build Device and Simulator versions
xcodebuild -workspace “../FrameworkWorkspace.xcworkspace" -scheme "${PROJECT_NAME}" ONLY_ACTIVE_ARCH=NO -configuration ${CONFIGURATION} -sdk iphoneos  BUILD_DIR="${BUILD_DIR}" BUILD_ROOT="${BUILD_ROOT}" clean build
xcodebuild -workspace “../FrameworkWorkspace.xcworkspace" -scheme "${PROJECT_NAME}" -configuration ${CONFIGURATION} -sdk iphonesimulator ONLY_ACTIVE_ARCH=NO BUILD_DIR="${BUILD_DIR}" BUILD_ROOT="${BUILD_ROOT}" clean build

# Step 2. Copy the framework structure (from iphoneos build) to the universal folder
cp -R "${BUILD_DIR}/${CONFIGURATION}-iphoneos/${PROJECT_NAME}.framework" "${UNIVERSAL_OUTPUTFOLDER}/"

# Step 3. Create universal binary file using lipo and place the combined executable in the copied framework directory
lipo -create -output "${UNIVERSAL_OUTPUTFOLDER}/${PROJECT_NAME}.framework/${PROJECT_NAME}" "${BUILD_DIR}/${CONFIGURATION}-iphonesimulator/${PROJECT_NAME}.framework/${PROJECT_NAME}" "${BUILD_DIR}/${CONFIGURATION}-iphoneos/${PROJECT_NAME}.framework/${PROJECT_NAME}"

# Step 4. Convenience step to copy the framework to the project's directory
cp -R "${UNIVERSAL_OUTPUTFOLDER}/${PROJECT_NAME}.framework" "${PROJECT_DIR}"

# Step 5. Convenience step to open the project's directory in Finder
open "${PROJECT_DIR}"
  1. Select the “Exercise.project”’s app scheme from the list of schemes. Then open the Edit Scheme window. Select the “Build” option and add the “Framework.project”’s Cocoa Touch Framework target. Move it above the “Exercise” app in the list.
    exercise_scheme
  2. In the “Exercise.project” click on the project file and navigate to the “General” section. At the bottom is the “Linked Frameworks and Libraries” list. Click the “+” and add the “Framework.project”’s Cocoa Touch Framework from the list.
    exercise_addframework
  3. In the “Exercise.project” add a reference to your new framework using the <…> format. So it would look something like #import <Framework/Framework.h>. From then on you can use the framework as any other.
    exercise_importframework

A Framework for the New World

As many of you know, I am a full time iOS developer. It is a truly engaging and fun profession. However, it is also at times extremely frustrating. That frustration comes from either silly platform limitations, but more often is the feeling that I’m solving a problem that surely has already been solved. Generally this is true and I can find a pre-existing solution. Yet, there are times where I can’t and I want to provide that solution for others. Read on to find out how that can be done.

Frameworks

So how do developers share their solutions? There are pretty much two options — give the code away and let the other developer include that in their own project. This is common and in a way how open source works. The other option is to provide the functionality as a precompiled executable that the user can link against. This precompiled executable is generally called a library. Some are linked to at compile time and others during execution of the final program.

For iOS we could do both, source or library. However, iOS is limited to only compile time linked (also called statically linked) libraries. Except, not really. The native functionality was more of a dynamically linked style, just Apple did not allow third party dynamically linked libraries. Overall, this isn’t a big deal.

The downside to a pure library though is that it is a stand alone executable but you still also need the interface information for what the library exposes. Therefore, a developer must also have the, in this case, header files to make use of the library, which means more files to manually handle.

The solution to this that Apple uses is called a framework. A framework is pretty much just a library and all the necessary headers wrapped up in a specific folder format. The specific format is not important here, just the concept that a framework makes it exceptionally easy for a developer to use another developers product.

The Challenge

With a framework, it is easy to integrate another developer’s work, which makes everyone else’s job easier. But, that isn’t all there is to it. Thankfully though, the challenge is all on the provider’s side. Those challenges consist with managing dependencies and getting all the different necessary versions of the framework’s library put together.

Dependencies

Remember how I started by saying I didn’t want to have to solve problems that others had already solved? Turns out that is a pretty common feeling. And there are some problems that pretty much everyone needs to solve, like connecting to and communicating over the internet. As such, often one framework will have a dependency on another. This leads to the first primary challenge — how does a developer indicate a dependency and how is that handled?

Generally indicating a dependency is done by a README.md file. These files are pretty much just instructions on how to make use of a framework, information on design decisions, and maybe some information on the author(s). At that point, it is assumed that anyone using the framework will be aware enough to read the file. These instructions can be simply “Copy the framework into your project” and that’s it. Or it may be more complicated such that the user must not only include the current framework but other frameworks/libraries. In that case there will probably be instructions also about where those other pieces must be located in the folder structure so it can be found by the original framework.

In the end though, this is not terrible. It is kind of part of the job. All it requires is decent documentation, which a developer should be generating any how.

Versions/Slices

The other challenge is including a version of the framework that works everywhere. Technically this isn’t necessary, but a user is going to want this. If you did not provide a version that worked on both the simulator and a real device, no one would use the framework. This is because it would make development so much harder. It is always slower to debug on device, so it is great to do most development on the simulator. But if you only provide a version that runs on the simulator then your framework will not work on a device. Each of these types is called a slice, so I’ll use that going forward.

So you gotta essentially provide two versions of the framework. Okay, not the end of the world. You just need to compile the project twice and then use a specific tool to combine the two executables. Thankfully Apple provides just the tool for the job lipo. Yeah, really. And guess what it is called if you take two different slices and combine them? The result is called a fat library. Really. If you were to take one slice out? That’s a thin library.

You can’t make this stuff up.

How to Just Do It

Originally this post was going to include how to set up a stand alone framework that would work on both the simulator and device. But that was before I wrote all that introduction. Instead, I’m going to break that out into its own post for easier reference. So, tune in next time for a quick post on how to create the framework.

Just a Quick Thought

Recently I made a big change in my work process – I started keeping track of my hours in a very conscious and diligent manner. I used to just use my commit history as my time tracker, and sometimes that was pretty good. On busy days I would be committing code every little while, and any long gap could be explained by either a complicated item or meetings. Where meetings could be something business related or a game of darts, both of which are critical to my success.

Now though, I actually write down the start and end time of when I perform any task, rounded to the nearest quarter hour. This maps to how our company tracks time. I started using Stickies, and that worked pretty well because I just needed a light text editor where I could keep essentially three pieces of information per line — start time, end time, and the task description. That worked for a couple months. Then a couple weekends ago I went on a trip to see friends and family in the KC area and forgot my laptop. I wasn’t going to be back to Chicago before Monday night and so was going to actually not be able to enter my time for the previous week. After notifying my manager and our resource manager, I started thinking of ways to solve this.

What I came up with was to do the exact same thing but in a different program, one that had some kind of cloud back up. I could have used one of my blogging tools, as they all seem to have a way to back up to the cloud, but that would have been a bit heavy handed. Then I remembered a long time ago I downloaded a program called Notational Velocity that I had heard about on a Verge post. It is a super minimalist interface but has a slick integration with the Simplenotes website. This fit right in with my current push to do more things via the command line and checked the need for a cloud solution. Now, I keep each week’s time as a separate note and I have access to that anywhere I have an internet connection.

Concept Description: Applicative Functor Typeclass

Concept Description: Applicative Functor Typeclass

Previously, I mentioned that I would cover Applicative Functors in a future blog post, well that time has come. In that blog post, I ended up discussing more of how an Applicative Functor worked than I intended, but it was necessary. So please read that first, then come back here for some more information.

Definition

Before going any further, let’s look at what the actual definition of an Applicative Functor is from the Haskell source:

A functor with application, providing operations to
– embed pure expressions (pure), and
– sequence computations and combine their results (<*>).

So, there are two things an applicative functor must do — be able to wrap an expression up in some context, and then combine some stuff that is in contexts together to make other stuff. Well, we discussed that a little with some examples last time, so I’ll just briefly recap that information.

Starting with the sequencing statement, that means that you can have a function wrapped up in a Maybe, like Just (+) and then combine that with two numerical Maybe values and get a result that is inside a Maybe.

Prelude Control.Applicative> Just (+) <*> Just 1 <*> Just 2
Just 3

We also saw before that you can consider a List as a kind of context, and that you can have functions wrapped up in a list and apply those to either lists of values or just singular values.

Prelude Control.Applicative> [(+), (-)] <*> [1] <*> [4]
[5,-3]
-- [(+), (-)] <*> [1] => [(1+), (1-)]
-- [(1+), (1-)] <*> [4] => [(1+4), (1-4)] => [5, -3]
Prelude Control.Applicative> [(+), (-)] <*> [1,2,3] <*> [4]
[5,6,7,-3,-2,-1]
-- [(+), (-)] <*> [1,2,3] => [(1+), (2+), (3+), (1-), (2-), (3-)]
-- [(1+), (2+), (3+), (1-), (2-), (3-)] <*> [4] => [(1+4), (2+4), (3+4), (1-4), (2-4), (3-4)] => [5, 6, 7, -3, -2, -1]

All of this is pretty cool. However, notice how everything had to start out in the same type of context? As in, the functions and values had to both be in Maybe or a List? Well, that is kind of a problem because your value may not come that way and you may not know exactly what the context in question is. Therefore, the Applicative Functor typeclass requires the pure function also. The pure function is used to wrap some value up into a specific context. The best way to see that is with some examples:

Prelude Control.Applicative> pure 4 :: Maybe Int
Just 4
Prelude Control.Applicative> pure (+) <*> Just 5 <*> Just 6
Just 11
Prelude Control.Applicative> pure (+) <*> pure 5 <*> Just 6
Just 11
Prelude Control.Applicative> pure (+) <*> [1,2,3] <*> pure 4
[5,6,7]

Minimal Complete Definition

Just as for Functors, there is a minimal complete definition for Applicative Functors. This minimal definition is exactly as we have above — pure and (<*>).

pure

This is a pretty easy function to understand after looking at it:

pure :: a -> f a

It literally takes some value and wraps it up in the Applicative Functor’s context. If we look at a few examples, it probably will make the most sense.

instance Applicative [] where
    pure x    = [x]

As you can see, the implementation for List just takes the value and puts it into a single element List. Pretty easy. Can you guess how Maybe’s pure is implemented?

instance Applicative Maybe where
    pure x = Just x

That’s easy as pie! Again, pure takes some value and puts it into the bare minimum context that the type can represent while maintaining the original content of the value.

(<*>)

Now then, the sequential operator (<*>) is a bit more complicated as evidenced by its type signature.

(<*>) :: f (a -> b) -> f a -> f b

What is happening here? Well, you might notice this type signature looks very similar to the map ((a -> b) -> a -> b) and fmap ((a -> b) -> f a -> f b) type signatures. That is intentional and it essentially works like those functions also — it takes a function and applies it to some value to get a result. However, in this case, kind of like with fmap, it takes into consideration the context that the value is wrapped in, and now also takes into consideration the context that the function is wrapped in. Therefore, the implementation for (<*>) will have to unwrap not only the value but also the function. How about, we try to write out the Maybe implementation to see if we understand what is happening.

The first thing is we know the function and the value both come in a context of Maybe which means either could be a Nothing. That tells us that we probably need to check both for being a Nothing which means we will end up with two patterns to match on — the function is Nothing or the value is Nothing, so let’s start there.

instance Applicative Maybe where
    Nothing <*> _ = Nothing
    _ <*> Nothing = Nothing

Okay, so that makes sense — if you have a Nothing in either position, the result should be Nothing. Okay, cool. How about when there is something in both positions?

instance Applicative Maybe where
    Nothing <*> _ = Nothing
    _ <*> Nothing = Nothing
    Just f <*> Just val = pure (f val)

Well, if there is something in there, we just need to apply the function to the value and wrap that all up in the correct context. And with that, we are done … or are we? Is there a way to simplify this and write it using stuff that is already defined? Let’s look at the declaration of the Applicative Functor typeclass:

class Functor f => Applicative f where

Hmmm, it looks like the typeclass is actually just Applicative but that it has a type constraint of Functor on it. Okay, so that explains why it is called Applicative Functor — if the type implements Applicative it must also be a Functor. Which means that the type also implements all the behaviors of a Functor, which includes fmap. And because of that we know that we can use fmap to handle applying a normal function to a value in a context … which is exactly what two out of the three patterns we had for (<*>) are doing. So let’s see how we can rewrite that.

instance Applicative Maybe where
    Nothing <*> _ = Nothing
    Just f <*> val = fmap f val

Here we take advantage of knowing that fmap will give us back a value in the correct context to begin with, and that it will apply a normal function to a value in a context. For Maybe, we know that fmapping a function over a Just will get another Just result, while doing the same to a Nothing will produce a Nothing. Exactly what we want! Awesome!

Another great implementation to look at is that of List. I won’t go over it in detail here, but it isn’t too bad to understand. It uses list comprehensions, which I believe we have discussed. If not, please refer to this chapter of Learn You a Haskell to get an idea of how those work.

Applicative Laws

You didn’t think you would get out of here without having to obey something did you? If so, time to learn a hard truth about life — there are only three things for certain in life: death, taxes, and that there are more rules. Thankfully, these rules are pretty easy.

Identity

The first rule/law that an Applicative implementation must follow, is that of identity. That means if you apply id to a value v, you must get v as the result. This would be written like so:

pure id <*> v = v

This is pretty easy to understand. You would not want to write an applicative implementation that messed with wrapped up value. If you were to do so, there would be all kinds of subtle bugs introduced and the code assuming a true applicative would not work correctly.

Composition

This law states that the order of operation of two applicatives should not matter. The official documentation writes the law as follows.

pure (.) <*> u <*> v <*> w = u <*> (v <*> w)

However, I think that is a bit hard to read and reason through early on. As such, I think concrete examples are a little easier to wrap my head around. Below is a simple example expressing this law.

Prelude Control.Applicative> pure (.) <*> pure (5+) <*> pure (3*) <*> Just 6
Just 23
Prelude Control.Applicative> pure (5+) <*> (pure (3*) <*> Just 6)
Just 23

What the first line of the code above does is the following steps:

pure (.) <*> pure (5+) <*> pure (3*) <*> Just 6 =>
pure ((5+) .) <*> pure (3*) <*> Just 6 =>
pure ((5+) . (3*)) <*> Just 6 =>
Just ((5+) . (3 * 6)) = Just (5 + 18) = Just 23

While the third line converts to:

pure (5+) <*> (pure (3*) <*> Just 6) =>
pure (5+) <*> (Just (3*6)) =>
pure (5+) <*> Just 18 =>
Just (5 + 18) = Just 23

Pretty neat how those do end up equal, huh?

Homomorphism

The third law means that if you wrap a function and a value using pure, the result is the same as applying the function to the value and wrapping that result in pure.

pure f <*> pure x = pure (f x)

Straightforward, yeah?

Interchange

The last law is one I don’t really understand. The point of the law is that either (<*>) can be written with the function on the left and a normal wrapped value on the right, or the ($) operator can be inserted before the value and placed on the left with the function on the right. As I have never run into the situation where this is useful, I can’t really come up with a meaningful example, but below is a simple example that shows it holds true for Maybe.

Prelude Control.Applicative> Just (3+) <*> pure 6
Just 9
Prelude Control.Applicative> pure ($ 6) <*> Just (3+)
Just 9

Wrap Up

And with that, we’ve “covered” Applicative Functors. As a recap, we discussed how Applicative the typeclass requires that the type already be a Functor and so gets all the benefits of Functor. Additionally, Applicative types represent the ability to wrap a function up in some context and apply values within the same kind of context to that function. To explain this we walked through the implementation for Maybe and looked at the implementation for List. While these are not the only implementations that Haskell comes with out of the box, they are the most commonly seen by beginners and are instructive when attempting to write your own implementation for custom data types.

If you have any questions about the topics covered here, please let me know in the comments and I will gladly do what I can to answer those!

Overlap Graphs

Welcome back to your friendly guided stroll through the Project Rosalind problems. Previously we’ve solved any number of problems that revolved around primarily string manipulation. As in, we’ve taken a string and parsed it to mean a specific DNA strand, we’ve counted elements in the string, and we’ve even transformed one string into another. All cool stuff. But this time, we are going to be discussing a whole new kind of beast — graphs! I’m no computer science major, so I’m likely to get some of the terminology wrong, but I’d be happy to be corrected. Otherwise, though, I feel pretty solid about the stuff I’m presenting and so look forward to you joining me after the break.

Problem Statement

This problem revolved around finding DNA strings that share some properties. The specific properties we are interested in are if one string has a prefix, or start, as another string has suffix, or end. The Rosalind problem statement explains these concepts in the context of graphs. Graphs in the sense of computer science are generally referring to what others would think of as a map. A graph will have a set of nodes, like points on a map, that are connected by what are called edges, like the roads on a map.

From that simple concept of a graph, it is possible to develop quite a wide range of properties and problem sets. One famous problem that is graph related is called the Traveling Salesman. I won’t go in to it here, but it is worth investigating. Any how, a graph can be generalized into one of two types, directed or undirected. An undirected graph is one where the edges between nodes have no distinct flow. This can be thought of as a normal road, it has traffic moving in both directions. A directed graph is one where the edges do have a distinct flow. This can be thought of as a one way road. Technically it is possible for a directed graph to have edges that are bidirectional, but that is a detail we will not concern ourselves with today.

The reason this problem is called “Overlap Graphs” is because we are going to use the prefix and suffix for the DNA strings as the nodes on a graph. Then we will consider the edges to be from any suffix that matches a prefix or prefixes. Because we will only be going from suffix to prefix(es) this will be a directed graph — the relationship is one direction.

Solution Steps

Just as completing a marathon starts with the taking first step, so to will our solution to this problem. Unlike previous posts, I am not going to just jump into developing the solution and showing the code. Instead I wanted to present this problem in a different manner. This time we will first start with enumerating the steps necessary to solving the problem, then filling those in with code.

  1. We know with this problem we will be given a list of DNA strings in FASTA format. So the first thing we need to do is convert that input to a useful data structure, which we have done before, so that’s good.
  2. With those DNA strings we need to extract all of the prefixes and suffixes of a given length and store them in an easy to access manner. In this case, we should probably keep the prefixes together and keep the suffixes together.
  3. Having the nodes of the graph (the prefixes and suffixes), we can now build the edges. So we need to take each suffix and find which DNA strings have a matching prefix.
  4. Finally, with each suffix matched with zero or more prefixes, we need to build our output representation. Just a heads up, this is going to use another “higher level” concept in Haskell that I will explain in a later blog post, so go with it for now.

Convert the Input

This will not be the first time we have taken some input and converted it to FASTA format. However, recently I changed the code in my Bioinformatics/FASTA.hs file to be a little easier to use. The change I made was I went from using a type declaration for the FASTAFormatLine, which is essentially just an alias for some other data type to make type signatures a little more semantic, to instead declaring a new data type in record syntax.

What is record syntax? Well that is a syntax for declaring a data type that makes it easy to access the members of the data type via automagically provided functions. The general syntax looks something like:

data CustomType = CustomType { propertyName :: PropertyType, otherPropertyName :: OtherPropertyType }

Seems like more letters than necessary right? Well the benefit is now we can access the properties of CustomType with the functions propertyName and otherPropertyName that we get for free! That looks like the following:

Prelude> data Person = Person {name :: String, age :: Int}
Prelude> let p1 = Person {name="Someone", age=32}
Prelude> name p1
"Someone"
Prelude> age p1
32

Knowing that, it should be easy enough to read and understand the new data type used for FASTAFormatLine.

Get the Prefixes and Suffixes

Now we start the real work.

At least, this first step isn’t too bad. It consists of using the Data.Map module again to build a map that associates a prefix or suffix to the name of the DNA string we were originally given. To do that, I made use of the fromListWith function. fromListWith has two parameters, the first is a function that determines how to combine values if a given key already exists in the map, the second is the list of keys and values to convert to the map.

In this instance we want to build a map that goes from keys of [DNANucleotide] to values of [String]. The key type [DNANucleotide] will be the prefix or suffix of a specific length, and the value type [String] is so we can hold a list of DNA string names that have that given prefix or suffix. In this particular problem, we are only interested in prefixes and suffixes of length 3.

One peculiarity to take from this is that I had to convert from our nice little FASTAFormatLine data type to a 2-tuple with the [DNANucleotide] as the first element and the name of the DNA string as the second element. Also, there is a custom function called lastN’ that I use to get the suffixes, I’ll leave it up to the reader to work out how it works. I found it via StackOverflow and it took me a while to get it, but it is awesome.

Oh also, I introduced a type alias called DNAMap so I wouldn’t have to type M.Map [D.DNANucleotide] [String] everywhere.

type DNAMap = M.Map [D.DNANucleotide] [String]

lastN' n xs = L.foldl' (const . drop 1) xs (drop n xs)

buildMaps :: [F.FASTAFormatLine D.DNANucleotide] -> (DNAMap, DNAMap)
buildMaps fls =
  let pres = M.fromListWith (++) $ map (\fl -> (take 3 $ F.content fl, [F.name fl])) fls
      suffs = M.fromListWith (++) $ map (\fl -> (lastN' 3 $ F.content fl, [F.name fl])) fls
  in (pres, suffs)

Matchmaker Find me a Match

Now that we have all of our prefixes and suffixes, it is time to find which suffixes have matching prefixes. That step isn’t too bad really. All we do is go through each of the key-value pairs in the prefixes map, and look up that same key in the suffixes map. From there we create a list of 2-tuples. The first element of these tuples is the value from the prefix key-value pair, which is the name of all the DNA strings with that prefix. The second element is the value from the suffix look up, which is the name of all the DNA strings with that suffix.

Next, because we are doing a Data.Map look up, we know that there may be no values returned for a given lookup, so we need to filter out any Nothing results that we have generated.

Then we need to filter out any DNA string that is in both the prefix and suffix values of a given tuple. The reason for this is we are told to not include any DNA string that has a prefix that matches its suffix. This is to prevent a “directed loop”.

Lastly, we need to filter out any tuples where there is an empty value in either the first or second element.

To do all this I built two functions. The first builds the edges, finding the DNA strings who’s suffix matches a given prefix. The second does the filtering and data cleansing. Please forgive the relatively poor naming conventions …

buildMapJoins :: (DNAMap, DNAMap) -> [([String], Maybe [String])]
buildMapJoins (preM, sufM) =
  M.foldlWithKey (\acc k v -> (v, M.lookup k sufM):acc) [] preM

buildFilteredMapJoins :: Eq a => [([a], Maybe [a])] -> [([a], [a])]
buildFilteredMapJoins mjs =
  filter (\(x, _) -> x /= []) . map (\(xs, Just ys) -> (filter (\x -> not $ L.elem x ys) xs, ys)) $ filter (\(_, ss) -> MB.isJust ss) mjs

Generate the Output

Now that we finally have a list of 2-tuples that represent the DNA strings who’s prefix matches DNA strings with a suffix of the same value, we can make our output. The problem asks for a specific format of the output — Name1 Name2. That isn’t too bad, but it is a bit of a challenge.

Let’s look at the format of the data we have: [([String], [String])]. That is, we have a list of tuples with the first element being a list of strings, and the second is another list of strings. In the end we essentially want to have a list of strings where each element in the first list of strings is matched with each element in the second list. Maybe I should give an example:

-- what we have
[([“a11”, “a12”], [“b11”, “b12”]), ([“a21”, “a22”], [“b21”, “b22”])]
-- what we want to end with
[“a11 b11”, “a11 b12”, “a12 b11”, “a12 b12”, “a21 b21”, “a21 b22”, “a22 b21”, “a22 b22”]

So what we want is essentially all the permutations of combining the first element of a tuple with the second element of the same tuple. We could write some complicated (or maybe it wouldn’t be that complicated) function to do that for us, or we could try and find a more succinct and clever way. Turns out that Haskell gives us that more succinct and clever solution. What is that footloose and fancy free solution? Applicative functors of course!

Remember when I said I was going to introduce a concept that I’ll have to cover in a separate blog post? Well this is it. But at a high level, you can consider an applicative functor as a typeclass that says a function can be wrapped in some context, like a List or Maybe, which can then have a value that is also in a context applied to that function. Ugh, not really a great way to say that, and it is a bit reductionist, but still, about the best I can do right now.

Before we can do an example, we need to introduce a new operator that comes with the Applicative Functor typeclass — (<*>). This operator takes the value and applies it to the function, both of which are in the same kind of context. It simplifies the code we write because it does the wrapping and unwrapping of the context for us instead of having to write our own functions to do it, so we end up just expressing what we want to happen as opposed to how to do it. Also, notice how it looks like our Functor operator (<$>)? Not a coincidence as we will see.

So, an example. Let’s say we two Maybes that may each contain a String and we want to join them together if they both do have a value or get a Nothing if either is without a value. We could write a function to pattern match on the two Maybes and do the work, which would look something like:

combineMaybes :: Maybe String -> Maybe String -> Maybe String
combineMaybes Nothing _ = Nothing
combineMaybes _ Nothing = Nothing
combineMaybes (Just s1) (Just s2) = Just (s1 ++ s2)

However, that is a bit verbose don’t you think? Also, it is not very generic. We can only use it on Maybe values and those Maybe values would need to contain values that can be concatenated using the (++) operator. There must be a more generic way. Thus, Applicative Functor.

We can consider the Maybe to be a context that is wrapping up the String values. The context says that either there is a String or there is nothing. Applicative Functor will let us take a function that takes multiple plain values and successively apply values that are wrapped in a context to that function until we get a final result.

So to convert our original example we need to identify what the function is that takes values and which values are in a context. The function is the concatenation operation, (++), which only takes plain values and not values in context. The context values are the Maybes. Remember how with normal Functors we could take a value in some context and apply it to a function that only took one parameter? Well, that is how we will start and then we will use the new (<*>) to take another value in context and apply it to the newly curried function (partially applied), and keep doing that until we have the result we are looking for. That result will be in a context just like the result from the (<$>) was.

combineMaybes :: Maybe String -> Maybe String -> Maybe String
combineMaybes m1 m2 = (++) <$> m1 <*> m2

Dang, that is easier! And we know that Haskell’s native implementation of Maybe is taking care of the Just and Nothing conditions for us with the (<$>) and (<*>) operators, so that is overhead we don’t have to consider.

But, how does that help us when we have a List of Strings we want to find the permutations of? Well, it turns out that using (<$>) and (<*>) on a function and a List actually produces a new List made up of functions. What? Let’s see that in action.

Prelude Control.Applicative> let addLists list1 = (+) <$> list1
Prelude Control.Applicative> :t addLists
addLists :: (Num a, Functor f) => f a -> f (a -> a)
Prelude Control.Applicative> let addLists1 = addLists [1,2,3]
Prelude Control.Applicative> :t addLists1
addLists1 :: Num a => [a -> a]
Prelude Control.Applicative> length addLists1
3

Here you can see that the addLists function takes a List of numbers and builds a new List of the same length made up of functions that take a number and return a new number. Turns out, that if we could print the value of addLists1 it would look like:

[(1+), (2+), (3+)]

So it is a list of partially applied functions. What happens if we apply one value to this new List and how do we do that? Well we do that using the (<*>) operator!

Prelude Control.Applicative> addLists1 <*> [5]
[6,7,8]

Whoa, that ended up applying 5 to every element of the addLists1 list and getting the result of the addition in each case. Awesome! But what if we tried to apply a List of numbers instead of just the one?

Prelude Control.Applicative> addLists1 <*> [5,10,15]
[6,11,16,7,12,17,8,13,18]

Dang! Turns out that (<*>) will apply each value in the second list to each function in the first. That is really powerful. Also, notice that it is applying each of the values in the second list to the first value in the first list before moving on to the second value in the first list. This isn’t terribly important to us right now, but it is interesting.

So how does this help us? Well remember that we have essentially a List of 2-tuples where each element of the tuple is a List of strings. And we want to combine those Lists together to make a new List of strings. So we can use the Applicative Functor’s behavior that we just discussed to do that.

buildCombinations :: [([String], [String])] -> [String]
buildCombinations xs = concat $ map (\(ps, ss) -> (++) <$> ( (++) <$> ss <*> [" "] ) <*> ps) xs

So here, we use the Functor (<$>) to build a List of functions the length of all the suffixes for the current prefix where each function is the suffix and the concatenation operator. Then we use Applicative Functor’s (<*>) to apply the string ” “ to each one of those functions so that we get each suffix now with a space at the end. Next we do the same thing to concatenate each of the suffixes with its matching prefix in such a way that we get all combinations. Finally we use concat so that we go from a List of List of Strings to just a List of Strings.

The End

Holy cow, that was a bit of a mouth full wasn’t it? Well, we made it though, below is the final code sample. Hopefully it will all make some sense now. But remember, if Applicative Functors don’t make sense yet, I will do what I can to cover that topic a little more completely in a future blog post.

Concept Description: Functor Typeclass

In a recent blog post, I introduced the Functor type class. At that time I gave a high level explanation of what the type class defines and said I would cover it in greater detail later. Now is that time. I hope that after reading this, it will make sense and you will feel comfortable thinking of a type as being a Functor.

Basics: Typeclass

To kick this off, we should start with discussing what a typeclass is. At the most basic, a typeclass is a way to describe the behavior of specific data types. What I mean by that, is that any data type that is part of a given typeclass will have behave in specific ways and implement a given set of functions.

There are a large number of “basic” typeclasses defined in Haskell. Things like Eq which encapsulates the behavior of a data type being equatable, Ord which encapsulates the behavior of a data type being put in order (<, >, <=, etc), and there are others. Interestingly, we have been using a few of these in the Project Rosalind solutions haven’t we? Thankfully Haskell gave us the deriving keyword that helps give us free implementations of some simple typeclasses.

However, there are plenty of “basic” typeclasses we cannot get for free using deriving. One of these is the Functor typeclass. So, how do we get a data type to conform/implement Functor or any other typeclass that doesn’t come for free? In order to achieve that, we need to look at a given typeclass definition.

The definition of a typeclass will include a list of functions (and possibly rules) that need to be implemented for a data type in order for that data type to be considered conforming. This list may be relatively long and complicated, but not all of the individual functions are actually “required”. In fact, there will almost certainly be a slightly shorter list is defined as the minimal complete definition. The functions in that sub-list are actually then able to be used within the definition of all the other functions for the typeclass.

How to Implement

Now that you know of a typeclass that you want to make a data type conform to, you need to know how to do that. We have discussed the minimum complete definition list, so we know we need to implement those functions for a data type. But, where and how do we do that?

First, we use a new (to us at least) keyword, instance. instance tells Haskell that following lines will be used to define how a given data type conforms to a specific typeclass. Then there is the name of the typeclass followed by the name of the data type, and finally that line is ended with a where. After that, you are allowed to define the implementations for each of functions in the minimum complete definition list. As an example, consider the situation where you have a data type like our DNANucleotide:

data DNANucleotide = A | C | G | T

We know that going forward (and looking at our previous solutions) we will want to be able to determine if two instances of DNANucleotide are equal. So let’s work on implementing Eq for DNANucleotide. Let’s look at the definition of Eq then so we can know how to make DNANucleotide conform.

class Eq a where
	(==) :: a -> a -> Bool
	(/=) :: a -> a -> Bool
	x == y = not (x /= y)
	x /= y = not (x == y)

Notice how there are two functions that we need to implement? They are (==) and (/=), equal and not equal. But also see how the last two lines appear to be function definitions for (==) and (/=) in the infix form? Well, by having those definitions we actually end up only needing to implement one of the two functions, which means that the minimum complete definition for Eq is either (==) or (/=). Nice! So let’s do that. But which should we do? Let’s do (==) because that will be the simplest to comprehend.

instance Eq DNANucleotid where
	A == A = True
	C == C = True
	G == G = True
	T == T = True
	_ == _  = False

I’ll leave it up to the reader to reason about the last line in that definition. 🙂

Typeclass: Functor

Dang, that was a lot of stuff just on the “basic” concept of typeclasses, yet it was all necessary to get to the concept of Functor. Well, it was necessary in the context of needing to know that Functor is a typeclass and numerous data types are instance of Functor. Now though, you should at least understand what it means to say that a data type is an instance of some typeclass — it conforms to the typeclass.

So what is the typeclass Functor? First, let’s look at its definition and then talk about it some more.

class  Functor f  where
    fmap :: (a -> b) -> f a -> f b

In addition, let’s look at the documentation in the source regarding the Functor typeclass:

The 'Functor' class is used for types that can be mapped over.
Instances of 'Functor' should satisfy the following laws:

> fmap id  ==  id
> fmap (f . g)  ==  fmap f . fmap g

The instances of 'Functor' for lists, 'Data.Maybe.Maybe' and 'System.IO.IO'
satisfy these laws.

Starting at the top, we see that Functor only defines and requires the implementation of one function, fmap. Notice anything about the signature for fmap? Does it look familiar? It should, it is nearly identical to the signature for the standard map function. Pretty much the only difference is that there is this f in the fmap signature that isn’t in the standard map. What is that f? Good question!

The f that we see there is actually a type variable. What? Yeah, kind of abstract sounding, and it is a bit abstract. That f actually is a variable that will be replaced by a concrete type once that type is made an instance of Functor. As an example, Maybe is an instance of Functor and would have an fmap signature like:

fmap :: (a -> b) -> Maybe a -> Maybe b

See how the Maybe took the place of f? Not too terrible. It is the same overall concept that a function will have variable parameters that are replaced at execution time with real values. The same goes for definitions (essentially) for typeclass type variables.

Next we look at the documentation and see that Functor is for data types that can be “mapped” over. What does that mean? Does that mean we can draw graphical spatial representations of the data type? Of course not! It means that the data type can be transformed from one value to another in some consistent manner. Or more intuitively, a Functor instance makes some sense to have a map like function applied to it. And that is exactly what fmap is for — transforming one value to another within the context of a given data type.

Awesomely, we have already been doing this every time we use map on a list! A list is an instance of Functor where in fact fmap is implemented using map.

instance Functor [] where
    fmap = map

Within the context of a list it is easy enough to think of how it is a Functor. It makes sense that you could have a list of values and a function that takes in a parameter of the type of the list’s values and then returns another. Then with that function we could apply it to the list and get a whole new list that consists only of the transformed values. Maybe even has this property. It is essentially a container of some time, a Just will have some value and we could create a function that transforms that data. Then because it is a Functor we will get a Just with that new value.

Prelude> fmap (+ 1) (Just 2)
Just 3

How though is that accomplished? Well there is an instance of Functor for Maybe and it is implemented as so:

instance  Functor Maybe  where
    fmap _ Nothing       = Nothing
    fmap f (Just a)      = Just (f a)

That wasn’t too bad.

Lastly, consider that there apparently are “laws” a Functor has to satisfy. Those laws are pretty straight forward but non-trivial. Though you can come up with an implementation of any data type as a Functor, your implementation doesn’t actually mean that the data type satisfies these laws, and so therefore would not actually be a Functor. These laws are relatively easy to understand now that we understand fmap.

The Laws of the Functors

The first law is fmap id == id. What does this mean? It means that any time a Functor is fmapped with the function id, it must be equal to the original value. This may be easier to see from an example:

Prelude> fmap id [1,2,3,4]
[1,2,3,4]
Prelude> id [1,2,3,4]
[1,2,3,4]

See how that did not modify the original list? Not bad.

The second is a little more complicated. This one says that if you fmap the composed functions f and g over a Functor, that result must be the same as if you fmapped g and then fed that result to fmap with the function f. Again, an example may make more sense.

Prelude> fmap ((+2) . (*3)) [1,2,3]
[5,8,11]
Prelude> fmap (+2) $ fmap (*3) [1,2,3]
[5,8,11]

Is DNANucleotide a Functor

These, list and Maybe, have been pretty easy to grasp examples. However, there definitely could be much more complicated data types that could be a Functor. For now though, we won’t worry about those complicated examples. Instead we will consider if DNANucleotide can be a Functor.

Thoughts? I’ll give you a few moments to come up with your own answer.

Got an answer?

Let’s start with considering the meaning of Functor — can DNANucleotide be fmapped over? As in, does the DNANucleotide represent a value that has some containing context that we can transform the internal value? No, it does not. DNANucleotide essentially represents one of a few options for the nucleotides in a DNA string. But none of those nucleotides contain an internal value, so there is real concept of fmapping over a given nucleotide.

Kind of a let down huh? Yeah, I know. But but but! We did cover a lot of ground and discussed a wonderful topic. Now you know what it means when a data type is said to be a Functor and you can recognize when a data type you create should be a Functor. The reason it is important to make your data types instance of a given typeclass when possible is that there are standard library functions that depend on some standard typeclass and if your type does not conform, you will not get to use those functions for free.

Just remember, when we discussed Functors in the previous blog post we were referencing Maybe and lists, so now you know how and why they are Functors and why what we did worked! Or at least I hope you do. Please let me know in the comments or on Twitter if you have any questions. You can find me at @amonshiz.