I spent two hours the other night trying to hack out a shell script to import the archives into this thing. WordPress doesn’t have a simple way to just suck in a bunch of text files; you need to assemble them into something that resembles an RSS feed, and then import that. This brought up two problems:
1) All of the posts had to be on a single line in the
element. This involved a bit of dicking around with
awk and then
sed before I finally gave up and realized I could do it faster with
2) The pubdate element had to be in RFC-822 time format, and the only thing I had to work with was the filename, which was in YYYYMMDD format. It took most of the two hours to figure out the god damned
/bin/date program that ships with OS X is fundamentally broken, and ALL date commands in unixes are broken, because instead of curing cancer or stopping wars, about 80% of our world’s brainpower goes to stupid pursuits like “oh, I have philosophical issues with the 87 flags offered in BSD’s date program, so I’m going to write a completely incompatible one with 73 flags of its own, but still fail to address the two or three things people need to do with a time program.”
Case in point, this DOES NOT work in OS X:
date -j -f "%Y%m%d" "20090930" +"%+"
This DOES work:
date -j -f "%Y %m%d" "2009 0930" +"%+"
But my filenames are
20090930.html and not
2009 0930.html. That extra fucking space killed me.
AND YES, I am sure I am just an idiot, and if I sat around all day writing shell scripts, I would KNOW that blah blah blah hidden flag blah blah blah run it through a perl script blah blah blah. But truth of the matter is, I write maybe a half-dozen lines of shell script every three months, and then promptly forget everything. I’m sure if I sat around all day slicing onions into cubes, I would be a god damned onion slicing master, but the truth of it is, I only need to cut up maybe one onion a week tops, and I’m not about to quit my day job just to sit around slicing up onions.
Here’s the script:
for f in ~/website-mirror/oldjournal/html/1997*.html; do
OLDDATE=`basename -s .html $f`
THEYEAR=`echo $OLDDATE | cut -c1-4`
THEREST=`echo $OLDDATE | cut -c5-8`
SHIT=`echo $THEYEAR $THEREST`
pubdate=$( date -j -f "%Y %m%d" "`echo $SHIT`" +"%+")
echo -n "<pubDate>"
echo -n $pubdate
echo "<content:encoded>`tr '\n' ' ' < $f`</content:encoded>"