From DasBlog To SubText
My site was down because of DasBlog today. I have no idea why it started to misbehave again, and I don't really care at this point, that was the final reason that I needed to move to SubText. Here is how I did it:
- Put App_Offline.htm page in the blog directory, since otherwise DasBlog would kill the server
- Back up the existing blog.
- Prepare a SubText skin matching my current blog look & feel.
- I then took the DasBlog to BlogML importer here, which is still not ready, and started to play with the code. This is definately something that would require a developer to do, by the way.
I had to give up on the trackbacks because they issues an IsInRole() call and I am not connected to the office at the moment. I took the short way our and sprinkled Thread.CurrentPrincipal = new GenericPrincipal(..) all over the place (seem to reset itself for some reason), and it worked. - I then tried imprted the resulting file to SubText. I have a large blog, and this meant that I had to increase the timeout values. I then discovered that my posts are not legal in SubText, it was warning me about issues with <script> tags in the posts and then failed with some whitespace error that took some time to check in depth. Appernatly I had literal "<<" somewhere, which SgmlReader interpreted as whitespace, and that caused an error when it tried to save that.
- The posts are appearing in reverse order (the oldest are on the top), that is a side affect of the way BlogML worked, by output it to the export file in reverse order, and SubText accepted it in order. The fix:
UPDATE subtext_content
SET DateSyndicated = DateAdded
- The next issues, categories, I am pretty strict about categories, and it looked like the move messed them up, I got mutliply categories with the same name, probably from multiply attempts to import the same set of data, I cleaned it up with:
That still didn't solve the problem, I had an issue with categories that with duplicated names differencing in whitespace, so I found out about them with this query:DELETE FROM subtext_linkcategories
WHERE categoryid NOT IN ( SELECT categoryid
FROM subtext_links )
Note that I only had two duplicates categories overall, so it was easy to find the values. Then, I fixed the duplication with this query:SELECT LTRIM(RTRIM(title)),
MIN(categoryid),
MAX(categoryid),
COUNT(*)
FROM subtext_linkcategories
GROUP BY LTRIM(RTRIM(title))
HAVING COUNT(*) > 1
All that was left was cleaning up the empty categories, using the second query above.UPDATE subtext_links
SET categoryid = ( CASE categoryid
WHEN 452 THEN 460
WHEN 464 THEN 527
WHEN 465 THEN 504
WHEN 477 THEN 508
ELSE categoryid
END )
If you haven't figured it out yet, I really like the ability to run queries on my blog :-) - The next target was the comments, I first run this:
So I could claim ownership of over 600 comments on my blog. Then it was a matter of handling the stats correctly:update subtext_feedback
set isBlogAuthor = 1
where author = 'Ayende Rahien'
One thing that bugs me with the feedback is that there are multiply places that are using "order by id" to sort by date, in my case, this is not the case, it is actually the reverse of what really should happen.UPDATE subtext_config
SET commentcount = ( SELECT COUNT(*)
FROM subtext_feedback
WHERE feedbacktype = 1
),
PingTrackCount = ( SELECT COUNT(*)
FROM subtext_feedback
WHERE feedbacktype = 2
)
- Now, the issues of preserving the old URLs. That one is very important to me, since I have a lot of links directed to my site that I don't want to break. I first had to run a check to see how many duplicate entries name I had, and realized that I had merely 12 duplicate entries, this is something that I can live with. I should explain that the importing process didn't give me a way to correlate between the DasBlog entry and the SubText entry. What I did have was the entry title and date, which should be enough, since I didn't post entries with duplicated names on the same day. I created the following table:
And I run the following little boo script:CREATE TABLE subtext_MappingFromDasBlog
(
Id int identity(1, 1)
primary key
not null,
subtextId int references dbo.subtext_content ( id )
not null,
date datetime not null,
dasBlogId uniqueidentifier not null,
dasblogUrl nvarchar(255) not null
)
Just let me iterate that the above is not production code, it is one off code.import System
import System.Xml
import System.IO
import System.Data
import System.Data.SqlClient
con = SqlConnection("Data Source=localhost;Initial Catalog=SubtextData;Trusted_connection=yes;")
con.Open()
xdoc = XmlDocument()
xdoc.Load("C:\\export.xml")
nsMgr = XmlNamespaceManager(xdoc.NameTable)
nsMgr.AddNamespace("blog", "http://www.blogml.com/2006/09/BlogML")
for node as XmlNode in xdoc.SelectNodes("/blog:blog/blog:posts/blog:post", nsMgr):
id = Guid(node.SelectSingleNode("@id").Value)
url = node.SelectSingleNode("@post-url").Value;
title = node.SelectSingleNode("blog:title/text()", nsMgr).Value;
dateCreated = date.Parse(node.SelectSingleNode("@date-created").Value)
print "Id ${id} url ${url} title ${title} date ${dateCreated}"
using cmd = con.CreateCommand():
cmd.CommandText = """SELECT Id FROM subtext_Content
WHERE Title = @title and year(DateAdded) = @year
and month(DateAdded) = @month and day(DateAdded) = @day""";
cmd.Parameters.AddWithValue("title", title);
cmd.Parameters.AddWithValue("year", dateCreated.Year)
cmd.Parameters.AddWithValue("month", dateCreated.Month)
cmd.Parameters.AddWithValue("day", dateCreated.Day)
subtextId as int = cmd.ExecuteScalar()
cmd.Parameters.Clear()
cmd.CommandText = """INSERT INTO subtext_MappingFromDasBlog
VALUES(@subtextId, @date, @dasBlogId, @dasblogUrl);"""
cmd.Parameters.AddWithValue("subTextId", subtextId)
cmd.Parameters.AddWithValue("dasBlogId", id)
# just the date, not the time
cmd.Parameters.AddWithValue("date", dateCreated.Date)
#trim the http://www.ayende.com/Blog/yyyy-mm-dd/cmd.Parameters.AddWithValue("dasblogUrl", url.Substring(38) )
cmd.ExecuteNonQuery()
con.Dispose()
Now, to get some use out of it...
SubText supports UrlRewriting, with HttpHandlers, so the first thing I needed to do is enable going to a post just by its id. I had to add this to the HandlerConfiguration/httpHandlers section:
Now urls such as /archive/343.aspx are valid.<HttpHandler pattern="(?:/archive/\d+\.aspx)$" controls="viewpost.ascx,Comments.ascx,PostComment.ascx"/>
Now for the more complex stuff, I wanted to be able to support the following formats.- CommentView,guid,{guid}.aspx
- PermaLink,guid,{guid}.aspx
- NormalizedPostTitle.aspx
- /yyyy/mm/dd/NormalizedPostTitle.aspx
It took a while, but I manage to integrate that into SubText cleanly (an additional handler and changing the web.config), so the old urls should still work. The issue was mainly
Important: If you notice a borkne link please let me know about it.
- Syndication - RSS / Atom - I get a lot more readers on the site than on RSS, but it is obviously very important thing to keep. Nothing is as annoying as a broken link. In this case, dasBlog is using SyndicationService.asmx
- Binaries - I merely kept them in the same place (I am talking about images, attachments, etc.
The reason that I wrote all of that is that I would actaully remember what I did next time that I would have to upgrade / modify the blog.
The last thing that you need to know is how to configure the url mapping:
<HandlerConfiguration defaultPageLocation="DTP.aspx" type="Subtext.Framework.UrlManager.HandlerConfiguration, Subtext.Framework">
<HttpHandlers>
<HttpHandler pattern="login.aspx" handlerType="Page" pageLocation="Login.aspx"/>
<HttpHandler pattern="(?:/admin\/handlers\/BlogMLExport\.ashx)" type="Subtext.BlogML.BlogMLHttpHandler, Subtext.BlogML" handlerType="Direct"/>
<HttpHandler pattern="/Admin/" handlerType="Directory" directoryLocation="Admin"/>
<HttpHandler pattern="/Providers/" handlerType="Directory" directoryLocation="Providers"/>
<HttpHandler pattern="/rss\.aspx$" type="Subtext.Framework.Syndication.RssHandler, Subtext.Framework" handlerType="Direct"/>
<HttpHandler pattern="/SyndicationService\.asmx/GetRss$" type="Subtext.Framework.Syndication.RssHandler, Subtext.Framework" handlerType="Direct"/>
<HttpHandler pattern="/atom\.aspx$" type="Subtext.Framework.Syndication.AtomHandler, Subtext.Framework" handlerType="Direct"/>
<HttpHandler pattern="/SyndicationService\.asmx/GetAtom$" type="Subtext.Framework.Syndication.AtomHandler, Subtext.Framework" handlerType="Direct"/>
<HttpHandler pattern="/comments/commentRss/\d+\.aspx$" type="Subtext.Framework.Syndication.RssCommentHandler, Subtext.Framework" handlerType="Direct"/>
<HttpHandler pattern="/aggbug/\d+\.aspx$" type="Subtext.Framework.Tracking.AggBugHandler, Subtext.Framework" handlerType="Direct"/>
<HttpHandler pattern="/customcss\.aspx$" type="Subtext.Web.UI.Handlers.BlogSecondaryCssHandler, Subtext.Web" handlerType="Direct"/>
<HttpHandler pattern="(?:/category\/(\d|\w|\s)+\.aspx/rss/?)$" type="Subtext.Framework.Syndication.RssCategoryHandler, Subtext.Framework" handlerType="Direct"/>
<HttpHandler pattern="(?:/ArchivePostPage.aspx)$" controls="ArchivePostPage.ascx"/>
<HttpHandler pattern="(?:/LinkPage.aspx)$" controls="LinkPage.ascx"/>
<HttpHandler pattern="(?:/ArticleCategories.aspx)$" controls="ArticleCategories.ascx"/>
<HttpHandler pattern="(?:/archives\.aspx)$" controls="SingleColumn.ascx"/>
<HttpHandler pattern="(?:/archive/\d{4}/\d{2}/\d{2}/\d+\.aspx)$" controls="viewpost.ascx,Comments.ascx,PostComment.ascx"/>
<HttpHandler pattern="(?:/archive/\d+\.aspx)$" controls="viewpost.ascx,Comments.ascx,PostComment.ascx"/>
<HttpHandler pattern="(?:/archive/\d{4}/\d{2}/\d{2}/[-_,+\.\w]+\.aspx)$" controls="viewpost.ascx,Comments.ascx,PostComment.ascx"/>
<HttpHandler pattern="(?:/archive/\d{4}/\d{1,2}/\d{1,2}\.aspx)$" controls="ArchiveDay.ascx"/>
<HttpHandler pattern="(?:/archive/\d{4}/\d{1,2}\.aspx)$" controls="ArchiveMonth.ascx"/>
<HttpHandler pattern="(?:/articles/\d+\.aspx)$" controls="viewpost.ascx,Comments.ascx,PostComment.ascx"/>
<HttpHandler pattern="(?:/articles/[-_,+\.\w]+\.aspx)$" controls="viewpost.ascx,Comments.ascx,PostComment.ascx"/>
<HttpHandler pattern="(?:/\d{4}/\d{2}/\d{2}/[\d\w]+\.aspx)$" type="UrlRewriting.For.DasBlog.FromDasBlogToSubText, UrlRewriting.For.DasBlog" handlerType="Factory" />
<HttpHandler pattern="(?:commentview,guid,[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}\.aspx)$" type="UrlRewriting.For.DasBlog.FromDasBlogToSubText, UrlRewriting.For.DasBlog" handlerType="Factory" />
<HttpHandler pattern="(?:commentview\.aspx)$" type="UrlRewriting.For.DasBlog.FromDasBlogToSubText, UrlRewriting.For.DasBlog" handlerType="Factory" />
<HttpHandler pattern="(?:/contact\.aspx)$" controls="Contact.ascx"/>
<HttpHandler pattern="(?:/posts/|/story/|/archive/)" type="Subtext.Web.UI.Handlers.RedirectHandler, Subtext.Web" handlerType="Direct"/>
<HttpHandler pattern="(?:/gallery\/\d+\.aspx)$" controls="GalleryThumbNailViewer.ascx"/>
<HttpHandler pattern="(?:/gallery\/image\/\d+\.aspx)$" controls="ViewPicture.ascx"/>
<HttpHandler pattern="(?:/(?:category|stories)/(\w|\s)+\.aspx)$" controls="CategoryEntryList.ascx"/>
<HttpHandler pattern="(?:/comments\/\d+\.aspx)$" type="Subtext.Framework.Syndication.CommentHandler, Subtext.Framework" handlerType="Direct"/>
<HttpHandler pattern="(?:/services\/trackbacks/\d+\.aspx)$" type="Subtext.Framework.Tracking.TrackBackHandler, Subtext.Framework" handlerType="Direct"/>
<HttpHandler pattern="(?:/services\/pingback\.aspx)$" type="Subtext.Framework.Tracking.PingBackService, Subtext.Framework" handlerType="Direct"/>
<HttpHandler pattern="(?:/services\/metablogapi\.aspx)$" type="Subtext.Framework.XmlRpc.MetaWeblog, Subtext.Framework" handlerType="Direct"/>
<HttpHandler pattern="(?:/default\.aspx)$" controls="homepage.ascx"/>
<HttpHandler pattern="(?:/[\d\w]+\.aspx)$" type="UrlRewriting.For.DasBlog.FromDasBlogToSubText, UrlRewriting.For.DasBlog" handlerType="Factory" />
<!-- this needs to be last -->
<HttpHandler pattern="(?:((\/\/default\.aspx)?|(\/\/?))?)$" controls="homepage.ascx"/>
</HttpHandlers>
</HandlerConfiguration>
And you can find the code for the actual URL mapping here.
It is simple, it works, and that is all I care about.
If you haven't guess, this migration was NOT a walk in the park. I hope that I got all the kinks out, but let me know if there is anything that I missed.

Comments
Comment preview