Skip to content


Out of Wordpress.com and Into Evernote.com

I mentioned in a previous post that I was using a private Wordpress blog to keep my notes. Not anymore. I migrated to Evernote.

Thanks to Maria Joao Valente for sending me the invite to evernote.

Evernote is a note organizers, similar to Journler which I used a while back.

Check out the About Evernote and their screencast. My highlights:
* Web client
* Desktop client
* Works with Mobile Devices
* Painless, automatic synchronization (think gmail + IMAP but better)
* Notes can be found by searching and filtering for text within images
* Clip (via bookmarklet) or email entire webpages into your account
* Can import html files (you’ll see why this was important for me)

See also: Wired Review and TUAW Review.

Migrating between applications has never been an easy task. In this case I need to migrate from a Wordpress blog to evernote. I could manually click “Clip to Evernote” for each post on that blog or I could’ve written a simple AppleScript to do it or I could probably have found a way to do it in Javascript or I could’ve taken advantage of the “clip” thing in another way. But off course I choose the hardest way possible - I wrote a python script to convert the Wordpress XML Export File to multiple HTML notes and then dragged those files to evernote. At least it was fun if a colossal waste of time…

Anyway here’s the python script in case you ever want to convert a wordpress blog (or more accurately a Wordpress XML Export File) to html files.

wpdepress.py

[sourcecode language='python']

# Copyright (c) 2008 Luis Rei
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the “Software”), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.

# Notes:
# - currently does not handle images, attachments or comments
# - was only tested on MacOS X (10.5)
# - not “carefully” developed e.g. poor exception handling, little testing, …
# - see also http://wordpress.com/blog/2006/06/12/xml-import-export/

import string, os, sys, getopt
from xml.dom import minidom

__author__ = ‘Luis Rei (luis.rei@gmail.com)’
__homepage__ = ‘http://luisrei.com’
__version__ = ‘1.0′
__date__ = ‘2008/03/23′

def convert(infile, outdir, authorDirs, categoryDirs):
“”"Convert Wordpress Export File to multiple html files.

Keyword arguments:
infile — the location of the Wordpress Export File
outdir — the directory where the files will be created
authorDirs — if true, create different directories for each author
categoryDirs — if true, create directories for each category

“”"

# First we parse the XML file into a list of posts.
# Each post is a dictionary

dom = minidom.parse(infile)

blog = [] # list that will contain all posts

for node in dom.getElementsByTagName(’item’):
post = dict()

post["title"] = node.getElementsByTagName(’title’)[0].firstChild.data
post["date"] = node.getElementsByTagName(’pubDate’)[0].firstChild.data
post["author"] = node.getElementsByTagName(
‘dc:creator’)[0].firstChild.data
post["id"] = node.getElementsByTagName(’wp:post_id’)[0].firstChild.data

if node.getElementsByTagName(’content:encoded’)[0].firstChild != None:
post["text"] = node.getElementsByTagName(
‘content:encoded’)[0].firstChild.data
else:
post["text"] = “”

# wp:attachment_url could be use to download attachments

# Get the categories
tempCategories = []
for subnode in node.getElementsByTagName(’category’):
tempCategories.append(subnode.getAttribute(’nicename’))
categories = [x for x in tempCategories if x != '']
post["categories"] = categories

# Add post to the list of all posts
blog.append(post)

# Then we create the directories and HTML files from the list of posts.

# The “base” directory
outdir += “/wordpress/”
if os.path.exists(outdir) == False:
os.makedirs(outdir)
os.chdir(outdir)

for post in blog:
# The “category” directories
path = “”
if authorDirs == True:
path += post["author"].encode(’utf-8′) + “/”

# This creates a path for the file in the format
# category1/category2/category3/file. Note that the category list was
# sorted.

if categoryDirs == True:
if (post["categories"] != None):
path += string.join(post["categories"],”/”)

if os.path.exists(path) == False and path != “”:
os.makedirs(path)

# And finally the file itself
path = outdir + path
title = post["title"].encode(’utf-8′)
filename = path + “/” + post["id"] + ‘ - ‘ + title \
+ ‘.html’

# Add a meta tag to specify charset (UTF-8) in the HTML file
meta = “”"”"”

f = open(filename, ‘w’)
f.write(meta+”\n”)

# Add “HTML header”
start = “\n\n\n\n\n”
f.write(start)

# Convert the unicode object to a string that can be written to a file
# with the proper encoding (UTF-8)
text = post["text"].encode(’utf-8′)

# Replace simple newlines with
+ newline so that the HTML file
# represents the original post more accuratelly
text = text.replace(”\n”, ”
\n”)

f.write(text)

# Finalize HTML
end = “\n\n”
f.write(end)

f.close()

def usage(pname):
“”"Displays usage information

keyword arguments:
pname — program name (e.g. obtained as argv[0])

“”"

print “”"python %s [-hac] [-o outdir] infile
Converts a Wordpress Export File to multiple html files.

Options:
-h,–help\tDisplays this information.
-a,–authors\tCreate different directories for each author.
-c,–categories\tCreate directory structure from post categories.
-o,–outdir\tSpecify a directory for the output.

Example:
python %s -c -o ~/TEMP ~/wordpress.2008-03-20.xml
“”" % (pname, pname)

def main(argv):
outdir = “”
authors = False
categories = False

try:
opts, args = getopt.getopt(
argv[1:], “ha:o:c”, ["help", "authors", "outdir", "categories"])
except getopt.GetoptError, err:
print str(err)
usage(argv[0])
sys.exit(2)

for opt, arg in opts:
if opt in (”-h”, “–help”):
usage(argv[0])
sys.exit()
elif opt in (”-a”, “–authors”):
authors = True
elif opt in (”-c”, “–categories”):
categories = True
elif opt in (”-o”, “–outdir”):
outdir = arg

infile = “”.join(args)

if infile == “”:
print “Error: Missing Argument: missing wordpress export file.”
usage(argv[0])
sys.exit(3)

if outdir == “”:
# Use the current directory
outdir = os.getcwd()

convert(infile, outdir, authors, categories)

if __name__ == “__main__”:
main(sys.argv)[/sourcecode]

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • Furl
  • Ma.gnolia
  • Reddit
  • StumbleUpon
  • Technorati
  • TwitThis
  • E-mail this story to a friend!

Related posts:

  1. Noise Sample
  2. Third time…
  3. BarcampPT 2008 - Conclusion

Posted in Personal, Programming, Python, Software.

13 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Carlos Rodrigues said

    It seems the application I’m searching for a long time, already signed for the beta invite but still waiting. Can you plase send me one invite?

    Thanks
    Vampaz

  2. lrei said

    I would send you an invite but they haven’t given me any :( at least not yet.

  3. good job on that. first impressions on evernote?
    when you have any invites, hit me with that. ;)

  4. lrei said

    @Bruno Costa I really do like Evernote. I’ll try to get invites to everyone that asked me.

  5. Mont said

    This is exactly what I want to do . . . but I’m too much of a script n00b to know how to use a python script.

    Any pointers? Or links to an AppleScript?

  6. lrei said

    @Mont
    (very simple explanation)

    1) Create a directory in your home folder (ex: wp)
    2) Place both the python script (wp-depress.py) and the Wordpress export file (see the wordpress documentation on how to get it, I’ll simply call it wordpress.xml) in the directory you just created.
    3) Open the terminal application (located in Applications/Utilities/Terminal.app).
    4)Type

    cd wp

    where wp is the directory you created in step 1
    5) Type

    python wp-depress.py

    to see the help
    6) The simplest way is just to type

    python wp-depress.py wordpress.xml

  7. Mont said

    Thanks!
    I’ll try that.

  8. fornetti said

    I do not believe this

  9. lrei said

    @formetti what don’t you believe in? and why?

  10. LuisRei did you import your journler data into evernote? if so, how? and did all the other data come in with the entries?

  11. lrei said

    @joelt No I imported my data from a wordpress.com blog into evernote. However I did use journler before but I had previously exported it to the said blog via Mars Edit and a journler to mars edit script that I found on the net. And no, only the text was exported I think I had to upload the rest (a few attached PDF and JPGs) by hand.

Continuing the Discussion

  1. Evernote : RAM pour tout linked to this post on March 24, 2008

    [...] tout cas il a convaincu Luis Rei qui a carrément importé et incorperé dans Evernote le WordPress blog qu’il utilisait pour [...]

  2. Evernote Invites « LuisRei.com linked to this post on March 27, 2008

    [...] have 6 evernote (previously mentioned here) invites up for grabs. If you’re interested, drop a comment with your email. No Comments [...]