I mentioned in a previous post that I was using a private Wordpress blog to keep my notes. Not anymore. I migrated to Evernote.
Thanks to Maria Joao Valente for sending me the invite to evernote.
Evernote is a note organizers, similar to Journler which I used a while back.
Check out the About Evernote and their screencast. My highlights:
* Web client
* Desktop client
* Works with Mobile Devices
* Painless, automatic synchronization (think gmail + IMAP but better)
* Notes can be found by searching and filtering for text within images
* Clip (via bookmarklet) or email entire webpages into your account
* Can import html files (you’ll see why this was important for me)
See also: Wired Review and TUAW Review.
Migrating between applications has never been an easy task. In this case I need to migrate from a Wordpress blog to evernote. I could manually click “Clip to Evernote” for each post on that blog or I could’ve written a simple AppleScript to do it or I could probably have found a way to do it in Javascript or I could’ve taken advantage of the “clip” thing in another way. But off course I choose the hardest way possible - I wrote a python script to convert the Wordpress XML Export File to multiple HTML notes and then dragged those files to evernote. At least it was fun if a colossal waste of time…
Anyway here’s the python script in case you ever want to convert a wordpress blog (or more accurately a Wordpress XML Export File) to html files.
wpdepress.py
[sourcecode language='python']
# Copyright (c) 2008 Luis Rei
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the “Software”), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
# Notes:
# - currently does not handle images, attachments or comments
# - was only tested on MacOS X (10.5)
# - not “carefully” developed e.g. poor exception handling, little testing, …
# - see also http://wordpress.com/blog/2006/06/12/xml-import-export/
import string, os, sys, getopt
from xml.dom import minidom
__author__ = ‘Luis Rei (luis.rei@gmail.com)’
__homepage__ = ‘http://luisrei.com’
__version__ = ‘1.0′
__date__ = ‘2008/03/23′
def convert(infile, outdir, authorDirs, categoryDirs):
“”"Convert Wordpress Export File to multiple html files.
Keyword arguments:
infile — the location of the Wordpress Export File
outdir — the directory where the files will be created
authorDirs — if true, create different directories for each author
categoryDirs — if true, create directories for each category
“”"
# First we parse the XML file into a list of posts.
# Each post is a dictionary
dom = minidom.parse(infile)
blog = [] # list that will contain all posts
for node in dom.getElementsByTagName(’item’):
post = dict()
post["title"] = node.getElementsByTagName(’title’)[0].firstChild.data
post["date"] = node.getElementsByTagName(’pubDate’)[0].firstChild.data
post["author"] = node.getElementsByTagName(
‘dc:creator’)[0].firstChild.data
post["id"] = node.getElementsByTagName(’wp:post_id’)[0].firstChild.data
if node.getElementsByTagName(’content:encoded’)[0].firstChild != None:
post["text"] = node.getElementsByTagName(
‘content:encoded’)[0].firstChild.data
else:
post["text"] = “”
# wp:attachment_url could be use to download attachments
# Get the categories
tempCategories = []
for subnode in node.getElementsByTagName(’category’):
tempCategories.append(subnode.getAttribute(’nicename’))
categories = [x for x in tempCategories if x != '']
post["categories"] = categories
# Add post to the list of all posts
blog.append(post)
# Then we create the directories and HTML files from the list of posts.
# The “base” directory
outdir += “/wordpress/”
if os.path.exists(outdir) == False:
os.makedirs(outdir)
os.chdir(outdir)
for post in blog:
# The “category” directories
path = “”
if authorDirs == True:
path += post["author"].encode(’utf-8′) + “/”
# This creates a path for the file in the format
# category1/category2/category3/file. Note that the category list was
# sorted.
if categoryDirs == True:
if (post["categories"] != None):
path += string.join(post["categories"],”/”)
if os.path.exists(path) == False and path != “”:
os.makedirs(path)
# And finally the file itself
path = outdir + path
title = post["title"].encode(’utf-8′)
filename = path + “/” + post["id"] + ‘ - ‘ + title \
+ ‘.html’
# Add a meta tag to specify charset (UTF-8) in the HTML file
meta = “”"”"”
f = open(filename, ‘w’)
f.write(meta+”\n”)
# Add “HTML header”
start = “\n\n\n\n\n”
f.write(start)
# Convert the unicode object to a string that can be written to a file
# with the proper encoding (UTF-8)
text = post["text"].encode(’utf-8′)
# Replace simple newlines with
+ newline so that the HTML file
# represents the original post more accuratelly
text = text.replace(”\n”, ”
\n”)
f.write(text)
# Finalize HTML
end = “\n\n”
f.write(end)
f.close()
def usage(pname):
“”"Displays usage information
keyword arguments:
pname — program name (e.g. obtained as argv[0])
“”"
print “”"python %s [-hac] [-o outdir] infile
Converts a Wordpress Export File to multiple html files.
Options:
-h,–help\tDisplays this information.
-a,–authors\tCreate different directories for each author.
-c,–categories\tCreate directory structure from post categories.
-o,–outdir\tSpecify a directory for the output.
Example:
python %s -c -o ~/TEMP ~/wordpress.2008-03-20.xml
“”" % (pname, pname)
def main(argv):
outdir = “”
authors = False
categories = False
try:
opts, args = getopt.getopt(
argv[1:], “ha:o:c”, ["help", "authors", "outdir", "categories"])
except getopt.GetoptError, err:
print str(err)
usage(argv[0])
sys.exit(2)
for opt, arg in opts:
if opt in (”-h”, “–help”):
usage(argv[0])
sys.exit()
elif opt in (”-a”, “–authors”):
authors = True
elif opt in (”-c”, “–categories”):
categories = True
elif opt in (”-o”, “–outdir”):
outdir = arg
infile = “”.join(args)
if infile == “”:
print “Error: Missing Argument: missing wordpress export file.”
usage(argv[0])
sys.exit(3)
if outdir == “”:
# Use the current directory
outdir = os.getcwd()
convert(infile, outdir, authors, categories)
if __name__ == “__main__”:
main(sys.argv)[/sourcecode]
Related posts:















13 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.
It seems the application I’m searching for a long time, already signed for the beta invite but still waiting. Can you plase send me one invite?
Thanks
Vampaz
I would send you an invite but they haven’t given me any
at least not yet.
good job on that. first impressions on evernote?
when you have any invites, hit me with that.
@Bruno Costa I really do like Evernote. I’ll try to get invites to everyone that asked me.
This is exactly what I want to do . . . but I’m too much of a script n00b to know how to use a python script.
Any pointers? Or links to an AppleScript?
@Mont
(very simple explanation)
1) Create a directory in your home folder (ex: wp)
2) Place both the python script (wp-depress.py) and the Wordpress export file (see the wordpress documentation on how to get it, I’ll simply call it wordpress.xml) in the directory you just created.
3) Open the terminal application (located in Applications/Utilities/Terminal.app).
4)Type
cd wp
where wp is the directory you created in step 1
5) Type
python wp-depress.py
to see the help
6) The simplest way is just to type
python wp-depress.py wordpress.xml
Thanks!
I’ll try that.
I do not believe this
@formetti what don’t you believe in? and why?
LuisRei did you import your journler data into evernote? if so, how? and did all the other data come in with the entries?
@joelt No I imported my data from a wordpress.com blog into evernote. However I did use journler before but I had previously exported it to the said blog via Mars Edit and a journler to mars edit script that I found on the net. And no, only the text was exported I think I had to upload the rest (a few attached PDF and JPGs) by hand.
Continuing the Discussion