If you want to skip the theory and put your hands straight into code, take a look at my sample parser script on Github.
kindlegen binary available in your path, so you can call it from anywhere.KindleGen creates books in the binary MOBI format (actually, Amazon’s AZW format is just MOBI with DRM).
Although we can generate an ebook from a plain HTML file, if we want to include navigation, cover, a table of contents, etc. we need to create a bunch of HTML files (one per chapter) and a OPF file. This one is just a XML file which contains the book’s metadata (author, title, publisher, etc.) and content structure.
In KindleGen’s zip file you can find an example of an OPF file, ready to be processed with this tool.
We could just download a HTML file, strip all the tags except a few ones (paragraphs, basic text formatting, etc.) and use this as our e-book content. In practice, you will want only some parts of the website. That’s why you need to analyze (parse) the content and grab only those parts you are interested in.
In this example, we’ll be parsing FanFiction.net, a site that hosts fan-created stories based on existing books, videogames, etc. In this website, stories are divided in chapters, and each one of them is served from a different URL. So, chapter #2 for the story with ID #6718049 can be accessed at http://www.fanfiction.net/s/6718049/2/.
If we look at the source code of that page, we can see that the main content is inside a DIV tag with the ID storytext:

Downloading the chapter and grabbing the story content is really easy:
doc = Nokogiri::HTML(open("http://www.fanfiction.net/s/6718049/2/"), "UTF-8")
content = doc.xpath('//div[@id="storytext"]').first.inner_html
The important method here is xpath, which is an XPath selector for the nodes in the document. This is really handy and usually the easiest way to access the tags we want to. If you have never used it, there’s a tutorial at W3Schools.
This time, we don’t need to worry about stripping unwanted HTML tags, since FanFiction.net CMS system only allows basic formatting tags. If this weren’t the case, we could use the inner_text method available in the Node class.
We are also interested in the chapter’s title, which is available in a dropdown list at the top of the page.
Chapters in the dropdown have this format: first comes the chapter number, followed by a dot and the chapter title. For instance: 1. This is a chapter title. The trick here is that the option element that stores this information has its value attribute with the name of the chapter (which we already know).

So we can search for that tag, take its inner text, and remove the "1. " string with a regular expresion:
title = doc.xpath("//option[@value='#{index}']").first.inner_text.gsub(/^\d+\. /, "")
Once we have downloaded and parsed all the chapter we want to include in the e-book, we need to create a HTML file for each of them. After that, we will also need to create an OPF file with links to those chapters. The easiest way to do this is using some kind of template system, like ERB (which comes with the standard Ruby class library).
The template for the chapters can be as simple as this one:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>
<h1>
<small>Chapter <%= index %></small><br />
<%= title %>
</h1>
<%= content %>
</body>
</html>
The code inside <%= and %> are Ruby variables that will be printed. Now we only have to create a file, supply the template with the variables that contain the chapter’s data, and render the template in the file we’ve just created.
# extend Hash class to transform key-value pairs into a binding to feed ERB templates
class Hash
def to_binding
res = Object.new
res.instance_eval("def binding_for(#{keys.join(',')}) binding end")
res.binding_for(*values)
end
end
# creates a HTML file using the template and data provided
def create_html_file(filename, template, data)
File.open(filename, "w+:utf-8") do |file|
file << ERB.new(File.read(template)).result(data.to_binding)
end
end
To finally create the HTML file with a chapter, we only need to put the data we want to input into the template in a Hash, and then call our create_html_file method.
chapter_data = {:title => title, :index => 1, :content => content}
create_html_file("chapter1.html", "chapter.html.erb", chapter_data)
The only thing left is to create a loop that iterates over all chapters, parses them and creates their corresponding HTML files.
With all the chapters HTML files already created, it's time to generate the OPF file. Inside Amazon's KindleGen distribution you can find a fairly complete OPF example file, but here's an ERB template with the bare minimum stuff:
<?xml version="1.0" encoding="UTF-8"?>
<package xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="BookId">
<!-- Metadata: -->
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<dc:title><%= title %></dc:title>
<dc:creator><%= author %></dc:creator>
</metadata>
<!-- manifest (book content) -->
<manifest>
<!-- list of resources -->
<% chapters.each_with_index do |chapter, index| %>
<item id="item<%= index %>" media-type="text/x-oeb1-document" href="<%= chapter %>"></item>
<% end %>
<!-- our content, ordered -->
<spine>
<itemref idref="item-toc"/>
<% chapters.each_index do |index| %>
<itemref idref="item<%= index %>"/>
<% end %>
</spine>
</manifest>
<!-- Guide key points -->
<guide>
<reference type="text" title="Beginning" href="<%= chapters.first %>"></reference>
</guide>
</package>
The most important section is the manifest. Inside it, we need to declare all the resources (in our case, the HTML files), and then link those resources in the spine subsection. Therefore, it doesn't matter in which order we declare our resources —the final book content will be ordered according to the spine specification.
If we wanted a table of contents, we only need to create a regular HTML file with links to the chapters (we can do this using an ERB template), and then include this file as if it were another resource in the manifest.
KindleGen is fairly easy to use: we only need to pass the name of the OPF file and we'll get a MOBI file. There are other flags, but for now we'll use -unicode, since that is the character encoding that the website we parsed uses (KindleGen's default encoding is Latin-1).
Assuming we have installed KindleGen in some location accessible in our path, we can call it inside our Ruby script by using back ticks:
puts `kindlegen book.opf -unicode`
And that's all, folks! If you are more curious or get stuck, take a look at my Github repository, with a full working example and table of contents creation.
This was a very basic example of how to generate a e-book from a website. In the Real World, chances are that you stumble upon a login screen or maybe you need navigate a site to automate even more this process. In this case, take a look at Mechanize, a Ruby gem that automates interaction with a website like if it were a browser.
]]>This quote is from an interview to Wonder Rusell at Runic Games:
We’ve always believed that Torchlight is a casual RPG because any player can get started and enjoy the game, from beginners to veterans. We wanted it to be hugely accessible.
How did they try to achieve that?
After choosing your character and a very short intro screen, you start the game in a town and you are given your first quest in a really short time. You are killing monsters in minutes! Usually, RPG games have a slow beginning. Torchlight skips complex character stories and throws you into the dungeons right away to kill some foes.
In the character screen, you get to choose the game’s difficulty level. Normal is recommended for players which are new to action RPG’s, while Hard is avised for veteran players (who have played Diablo or Sacred before). Not hurting newbies’ egos by stating that their current skills are ‘normal’ is a friendly gesture to them.
But what really makes a difference is that the game is filled with tutorial tips (which can be disabled). These tips tell you how to buy and sell stuff, how the stash works, etc. And you are not presented those tips all at once —they appear when you need help (for instance, when you find your first magic item, you are told how to identify it using a scroll).
In addition to that, the town is really small, so all non-player characters are close and you don’t have to waste time wandering around: you can trade and get new quests really quickly.
If you play Torchlight for first time for 30~60 minutes, you get a really solid game experience: what you have seen is what you will get. And this game experience is very rich.
In this first game session, you will experience all game mechanics:
The fact that you get to do all that stuff in a very short time (and this is true for the rest of the game) allows you to play Torchlight in relative short sessions, which is more appealing non-hardcore gamers.
While games like Diablo or Sacred (top referents in the genre) have dark and serious aesthetics, Torchlight aims to a more cheerful look. Torchlight‘s art is very Warcraft-esque, with saturated colors and cartoon characters. The game font is big and looks like a comic-book font.
Hardcore RPG gamers are prone to despise this graphic style (remember the Diablo III art direction controversy?), but Torchlight‘s aesthetics are potentially more likable for a bigger market segment (i.e., non-hardcore gamers).
Not only that, but having a high color contrast helps the gameplay: you can tear apart enemies, your character and special items from the background more easily.
Probably the only one unique feature in Torchlight, but the best one with no doubt: the pet. Your character has a pet which follows him the whole adventure. Cute, isn’t it? But cuteness is not the point: the pet is actually very useful.
The pet attacks enemies. It can also wear necklaces and rings, which can make it stronger. Furthermore, you can feed the pet with several types of fish and transform it in a different creature with different powers! From a spider to a fire elemental… There are a lot of transformations for the pet! The pet can also cast spells and summon minions (skeletons, zombies, etc.) that will fight at your side.
That alone could fully justify the inclusion of the pet in the game, but there’s one more thing: the pet can go back to town and sell your loot. Hell yeah. No more need to interrupt a dungeon exploration to go back to town and sell all the stuff. This is great for short game sessions, because you spend most of your time exploring and killing foes and not trading (which isn’t much fun).
Torchlight is an action RPG which tries to appeal to a player profile more casual than the standard RPG player, who usually is a hardcore game. While I wouldn’t label Torchlight as a “casual RPG” (like its creators do), I think that it does a very good job in being someone’s first RPG: cool aesthetics, user-friendly, apt for short play sessions… You can definitely beat this game without selling your soul, and have a great time doing so.
The guys at Runic Games have made a great game. Congratulations!
]]>