HomeRamblings  ⁄  ProgrammingRuby LanguageMacsSQLPython Language

Converting Oddmuse Wiki to Edgewall Trac

Published: October 02, 2008 (about 9 years ago)
Updated: over 2 years ago

Our company began long ago with wiki’s, but we chose the Oddmuse wiki way back when. These days, we’re heavy users of Trac wiki because of its integrated ticket support system. So what to do with all those old wiki’s that folks have stopped using and reading. The Oddmuse wikis still hold valuable data, but they have since become an administrative overhead to keep around, so I decided to convert them all to Edgewall’s Trac. To get started, I needed to know how to get the data into Trac. The following links gave me an API I could utilize to create trac pages and content:

Great! Except there’s one small problem. I don’t really know python all that well, even though I did do a bit of Zope development, oh, five years ago. These days, my language of choice is Ruby, so I needed to figure out how to parse the Oddmuse syntax and turn it into Trac syntax. The strategy: Parse with Ruby, load cleaned page files with Python. This script will take a filename at command line and convert it and save back to file, but different extension (I chose *.om for “oddmuse”). In the script below, I made the command-line optional as I discovered I didn’t handle every case as I worked on developing this script, so I simply changed the line that pulls the command-line argument to point to a specific file and output such to terminal and this allowed me to debug and fix as I went:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
#!/opt/local/bin/ruby
# $FS  = "\xb3";      # The FS character is the RECORD SEPARATOR control char in ASCII
# $FS0 = "\xb3";      # The old FS character is a superscript "3" in Latin-1
# $FS1 = $FS . '1';   # The FS values are used to separate fields
# $FS2 = $FS . '2';   # in stored hashtables and other data structures.
# $FS3 = $FS . '3';   # The FS character is not allowed in user data.

FS = 179.chr

SITE_NAME = "corporate_wiki"
ATTACHMENT_URL_BASE = "https://intranet.example.com/#{SITE_NAME}/attachment/wiki/ConvertedAttachments/"

require 'find'
require 'ftools'

def safe_page_name(pagename)
  result = pagename.gsub("'","")
  result = result.slice(/(\w+)/).first
end

def get_section(page, section)
  page.each_with_index {|e,i| return page[i+1] if e == section}
  return ''
end

def asterick_prefixed(text)
  t = text.match(/(\*+)([^\b]*)/)
  return text if t.nil?

  tokens = t.to_a
  return ('  ' * tokens[1].size) + '* ' + tokens[2]
end

def colon_prefixed(text)
  return "  #{colon_prefixed(text.slice(1,text.size))}" if text[0] == ":"[0]
  return text
end

def pound_prefixed(text)
  return "  1. #{text.slice(1,text.size)}" if text[0] == "#"[0]
  return text
end

def fix_headers(text)
  t = text.match(/(=+)([^=]+)(=+)/)
  return text if t.nil?

  tokens = t.to_a
  return "#{tokens[1]} #{tokens[2].strip} #{tokens[1]}"
end

def replace_a_url(url, text)
  text.gsub!("http://#{url}", ATTACHMENT_URL_BASE)
  text.gsub!("https://#{url}", ATTACHMENT_URL_BASE)
  text.gsub!("http:/#{url}", ATTACHMENT_URL_BASE)
  text.gsub!("https:/#{url}", ATTACHMENT_URL_BASE)
  return text
end

def replace_wiki_urls(text)
  matches = text.match(/\[http(.)*\]/)
  result = text
  if matches
    result = replace_a_url("wiki.example.com/#{SITE_NAME}/wikifiles/", result)
    result = replace_a_url("pdfreviewfiles/", result)
    result = replace_a_url("cgi-bin/", result)
  end
  result
end

def convert_formats(text)
  result = text.gsub(/\[\[(\w+)\]\]/,"[wiki:" + '' + "]")
  result = asterick_prefixed(result)
  result = colon_prefixed(result)
  result = pound_prefixed(result)
  result = replace_wiki_urls(result)
  result = fix_headers(result)
  result
end

def oddmuse_to_trac(text)
  result = ''
  in_pre = false
  text.split("\n").each do |line|

    # Detect indented lines as these are rendered as PRE html blocks
    trimmed = line.lstrip
    is_prefixed = ((trimmed.size < line.size) and (trimmed[0] != '*'[0]))
    result << "{{{\n" if (is_prefixed and !in_pre)


    result << "}}}\n" if (!is_prefixed and in_pre)
    in_pre = is_prefixed

    # Don't process special characters when in PRE block
    if in_pre
      result << line << "\n"
    else
      result << convert_formats(line) << "\n"
    end
  end
  result += "}}}\n" if in_pre
  result
end

path = ARGV[0] || "page/U/UsingThePhones.db"
raw_page = File.new(path).read
page2 = raw_page.split(FS + '2')
page3 = raw_page.split(FS + '3')

text = get_section(page3, 'text')

pagename = File.basename(path).split('.').first
author = get_section(page2, 'username')
address = get_section(page2, 'ip')
body = oddmuse_to_trac(text)

puts "writing: #{pagename}.om"
outfile = File.new(path.gsub('.db','.om'),'w')
outfile.puts 'pagename:' + pagename
outfile.puts 'author:' + author
outfile.puts 'address:' + address
outfile.puts body
outfile.close

puts body if ARGV[0].nil?

Below, comes the Python side of the equation. Once I did the heavy lifting in Ruby, I needed to generate all the pages. As you can see, there’s some parsing happening, but its very simple parsing and I was able to apply what simple Python knowledge I still retained pretty effectively:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import sys
from trac.env import Environment
from trac.wiki.model import WikiPage

print sys.argv[1]
file = open(sys.argv[1], 'r')
pagename = file.readline().split(':')[1].split('\n')[0]
author_name = file.readline().split(':')[1].split('\n')[0]
address = file.readline().split(':')[1].split('\n')[0]

text = file.read()

env = Environment('/srv/www/trac/corporate_wiki')

# Read an existing or new WikiPage:
page = WikiPage(env, pagename)
if page.text != text:
  page.text = text
  page.save(author=author_name, comment='imported from oddmuse wiki', remote_addr=address)

As you can see, this file simply takes the command line argument, open the file for reading, pull the various variables out with the remainder of the file going to the page content. During the conversion, I noticed some pages were failing to convert at all, like “Michael’sTodoList” The culprit, the tick in the page name and thus the filename. The scripts themselves weren’t the problem, it was the xargs parameter passing that was the issue (getting to that below). So I ran the below script to rename all the troublesome filenames and then added similar tick to underscore substitution in the main Ruby script above.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
require 'find'
require 'ftools'

total_size = 0

Find.find('./page') do |path|
  if FileTest.directory?(path)
    if File.basename(path)[0] == ?.
      Find.prune       # Don't look any further into this directory.
    else
      next
    end
  else
    if path.match(/[
A simple up-front execution of the script was all that was needed to now be able to pick up the missing pages:

~~~ bash
ruby fix_unsafe_files.rb

Finally, to bring it all together, I utilized find to locate all the *.db Oddmuse files which I had located into the various wiki subfolders where these scripts sat, passing the filename to the script via the xargs utility:

find . -name *.db -print0 | xargs -0 -L 1 ruby rdpage.rb

I then essentially repeated the above, but for the generated *.om files, passing to the Python mkpage.py script:

find . -name *.om -print0 | xargs -0 -L 1 python mkpage.py

One last thing, before we’re done. Each wiki had attachments. So I had to think how to get those attachments referenced and linked to the Trac wiki. A little investigation revealed that each attachment got a database entry in the attachment table. My task was a bit complicated by the fact that the users over the years in the Oddmuse environment scattered their attachments everywhere and also used varying syntax to refer to them, hence, if you review the main Ruby parsing script, you’ll see the “replace_a_url” method above that cleans all those funky URLs up. So the script I used below simply scans through all the directories where we had files stored and attached them to one “ConvertedAttachments” page. Probably not the ideal solution, but again, got the job done quickly.

1
2
3
4
5
6
7
8
#  type |          id          |  filename  |  size  |    time    | description | author |      ipnr
# ------+----------------------+------------+--------+------------+-------------+--------+----------------
#  wiki | ConvertedAttachments | gotapi.png | 200272 | 1218381947 |             | michael | 192.168.16.100

path = ARGV[0] || 'generate_attachments.rb'
filename = File.basename(path)

puts "insert into attachment values ('wiki', 'ConvertedAttachments', '#{filename}', #{FileTest.size(path)}, null, null, 'oddmuse','192.168.1.246');"

So, running the above needs to be piped to a SQL fisle and then executed against your Trac database. In my case, Postgresql. Since I had many directories to snatch up, I ran it for each folder as appropriate and simply kept appending until I had all the attachment directories represented.

ls | xargs -0 -L 1 >> attachments.sql
psql trac_db 

All in all, not too bad of a conversion for a few hour’s work. Its about 95% correct. Some of the page names didn’t translate to Trac, and could probably do with some regexp search and replace on the page names themselves. The biggest gain is that we have preserved the information in the old systems by porting it to the new systems we all know and use on a daily basis, so WikiGardening is more likely to keep the content fresher and its one less system for our end-users to figure out how to use. I can now retire the server running Oddmuse, stop doing backups and keeping up with documentation and training new system administrators on how to manage Oddmuse AND Trac.

]/) safe_path = path.gsub(“’”, “_”) puts path + ‘ to ‘ + safe_path File.move(path, safe_path) end end end ~~~

A simple up-front execution of the script was all that was needed to now be able to pick up the missing pages:

ruby fix_unsafe_files.rb

Finally, to bring it all together, I utilized find to locate all the *.db Oddmuse files which I had located into the various wiki subfolders where these scripts sat, passing the filename to the script via the xargs utility:

find . -name *.db -print0 | xargs -0 -L 1 ruby rdpage.rb

I then essentially repeated the above, but for the generated *.om files, passing to the Python mkpage.py script:

find . -name *.om -print0 | xargs -0 -L 1 python mkpage.py

One last thing, before we’re done. Each wiki had attachments. So I had to think how to get those attachments referenced and linked to the Trac wiki. A little investigation revealed that each attachment got a database entry in the attachment table. My task was a bit complicated by the fact that the users over the years in the Oddmuse environment scattered their attachments everywhere and also used varying syntax to refer to them, hence, if you review the main Ruby parsing script, you’ll see the “replace_a_url” method above that cleans all those funky URLs up. So the script I used below simply scans through all the directories where we had files stored and attached them to one “ConvertedAttachments” page. Probably not the ideal solution, but again, got the job done quickly.

1
2
3
4
5
6
7
8
#  type |          id          |  filename  |  size  |    time    | description | author |      ipnr
# ------+----------------------+------------+--------+------------+-------------+--------+----------------
#  wiki | ConvertedAttachments | gotapi.png | 200272 | 1218381947 |             | michael | 192.168.16.100

path = ARGV[0] || 'generate_attachments.rb'
filename = File.basename(path)

puts "insert into attachment values ('wiki', 'ConvertedAttachments', '#{filename}', #{FileTest.size(path)}, null, null, 'oddmuse','192.168.1.246');"

So, running the above needs to be piped to a SQL fisle and then executed against your Trac database. In my case, Postgresql. Since I had many directories to snatch up, I ran it for each folder as appropriate and simply kept appending until I had all the attachment directories represented.

ls | xargs -0 -L 1 >> attachments.sql
psql trac_db 

All in all, not too bad of a conversion for a few hour’s work. Its about 95% correct. Some of the page names didn’t translate to Trac, and could probably do with some regexp search and replace on the page names themselves. The biggest gain is that we have preserved the information in the old systems by porting it to the new systems we all know and use on a daily basis, so WikiGardening is more likely to keep the content fresher and its one less system for our end-users to figure out how to use. I can now retire the server running Oddmuse, stop doing backups and keeping up with documentation and training new system administrators on how to manage Oddmuse AND Trac.

comments powered by Disqus