App of the Month 7: Journal 1?

I did a much worse job last month with journaling my progress on the app of the month. Turns out that I’m just out of the habit of writing which I need to fix. But this isn’t that blog. This is the blog where I talk about what I did and lead into what I’m going to work on.

What Did You Do?

This past month I kind of picked a random project to kickstart my series. At the start of the month I had really no clue what to do. I wanted something simple that I could work on part time while I got acclimated to my job here in NYC and use as a springboard for next month. Yet, most of the way through the first week I had nothing and I was floundering for an idea.

Then it happened.

My wife and I had traveled to Washington DC to view a taping of the amazing podcast The Flop House. And also to visit friends that live there. To be clear, we spent more time with the friends than at the podcast taping, and these friends have a pair of medium sized stuffed bison so the trip was most definitely about that. But we did go to this taping. And it was hilarious. And it gave me an idea for my app: there is a wikia for the podcast but no apps, so I was going to create an app to let users search for reviewed movies from their iPhone. It was the perfect idea so I went with it, I mean what could go wrong?

Leading Last Sentence Leads

Turns out a bit can go wrong. First, life can happen. We moved into our new apartment at the beginning of August and had a number of fun complications around that. So that took time. Secondly, there was my real job that was additionally time consuming in a way it hadn’t been in a while. I’m not upset or annoyed by that fact, just timing. Lastly, there was the fact that apparently wikia doesn’t have a JSON API for getting to their data. Joy.

The biggest issue was that last one because it meant I had to end up scraping the HTML for each page I wanted to be available in the app. For those that don’t know, scraping is where a program receives the normal file that a browser presents and takes pieces of content from it to transform into another format. There is one scraping program that some people may be currently using: any ad blocker that claims to be able to block ads from Facebook news feed. Scraping is a common solution to many web based problems because not every website provides an API. Anyhow, I ended up having to do that and deciding if I wanted to write my own HTML parser, my own scraper, and my own data transformer.

HTML Parsing

For those iOS developers out there you will know that H/XTML parsing is a massive pain and requires a lot of work to get right with NSXMLParser and its cursed delegate. Ugh. Been there, done that. I instead opted to use a Swift library named Fuzi. This library is pretty convenient and has the API I was looking for: give it some data and get back an easily query-able object that has a tree like structure that can be handled recursively. Perfect.

HTML Scraping

The second step of scraping the HTML from the parsed website wasn’t terribly hard. But it was frustrating in ways because it required manually figuring out what the layout of the data was, and how to properly identify the interesting content in each section. While not difficult it was just tedious. I spent quite a bit of time staring at the Chrome Developer Tools inspecting the attributes of various elements.

Data Transformer

The last step of taking the scraped HTML elements and extracting exactly what I needed was the most interesting because the data from the raw wikia pages is pretty rough. My ideal would be to transform the HTML into a custom tree data type that would be easily traversed using simple functional patterns. Turns out that the wikia pages wanted nothing to do with that. When you look at the entry for a given movie, like Fantastic Four, it appears that there are clear sections and data hierarchy. However, that is all a lie, like the cake. Instead of something sane, the HTML is flat with plain header elements to separate content:

<h1>Episode</h1>
	<table>
	...
	</table>
<h2>Tag</h2>
	<ul>
		<li>...</li>
	</ul>
...

Really annoying. I spent way too much time working on taking this flat format and successfully converting it into a proper data type. I wanted to transform something like

let data = [h1, tr, tr, h2, li, li, h1, p, ...]

into

data = [(h1 [tr, tr, (h2 [li, li])]), (h1 [p]), ...]

What I mean by that is I wanted to have a data type like the following to represent each section of data (and its content)

enum DataTree {
    case Empty
    indirect case Node (content: String, children: [Node])
}

This would be doable, but frankly I half heartedly spent an afternoon working on it and I wasn’t happy with my solution so scrapped the enum solution and went with just a single nested array solution

data = [[h1, tr, tr], [h2, li, li], [h1, p], ...]

This ended up being good enough because I knew I wanted to use a collection view for the episode information view and anything more complicated than a single section with multiple rows is not worth it in a collection view.

Most Fun Part

The part of this project I enjoyed the most was getting the scrolling fluid in a text heavy app. In fact, this whole app is text so if the scrolling in the table view and collection view were not fluid it was going to be terrible. I ended up pretty much ripping off the idea behind Ryan Nystrom’s Hacker News Reader. The biggest difference between his implementation and mine is that he uses a bunch of fancy caches where I just store value objects in the view controller presenting the information and any time the user pops that view controller all that information is lost and has to be reconstructed later. Oh well. Also, he uses the new TextKit components to do sizing and layout whereas I just use CTFramesetterSuggestFrameSizeWithConstraints to get a size of the rendered text and in the one rare case that wasn’t sufficient I used the CTLineGetImageBounds function because the line was just emojis anyway. This proved good enough for me because the rendering is fast and everything is a value type with little memory overhead because I let the app deallocate all of the CTFramesetter and CTLineRef objects after they were done and only store the sizes. Nothing fancy, but it was fun to work through.

Where is it Now?

Well, the app was submitted late last night and is now in that sweet sleep known as Waiting For Review. If it gets approved I will definitely post and let everyone know where to find it!

What Will You Do Next?

I have an idea and I will post about that tomorrow. Promise … maybe.

Leave a Reply