I needed to convert a huge batch of mediawiki-files to html (had a 2010-03 copy of the now dead limewire wiki lying around). With a tip from RoanKattouw in #firstname.lastname@example.org I created a simple python script to convert arbitrary files from mediawiki syntax to html.
This script is neither written for speed or anything (do you know how slow a webrequest is, compared to even horribly inefficient code? …): The only optimization is for programming convenience — the advantage of that is that it’s just 47 lines of code :)
It also isn’t perfect: it breaks at some pages (and informs you about that).
#!/usr/bin/env python3 """Simply turn all input files to html. No errorchecking, so keep backups. It uses the mediawiki webapi, so you need to be online. Copyright: 2010 © Arne Babenhauserheide License: You can use this under the GPLv3 or later, if you add the appropriate license files → http://gnu.org/licenses/gpl.html """ from urllib.request import urlopen from urllib.parse import quote from urllib.error import HTTPError, URLError from time import sleep from random import random from yaml import load from sys import argv mediawiki_files = argv[1:] def wikitext_to_html(text): """parse text in mediawiki markup to html.""" url = "http://en.wikipedia.org/w/api.php?action=parse&format=yaml&text=" + quote(text, safe="") + " " f = urlopen(url) y = f.read() f.close() text = load(y)["parse"]["text"]["*"] return text for mf in mediawiki_files: with open(mf) as f: text = f.read() HTML_HEADER = "<html><head><title>" + mf + "</title></head><body>" HTML_FOOTER = "</body></html>" try: text = wikitext_to_html(text) with open(mf, "w") as f: f.write(HTML_HEADER) f.write(text) f.write(HTML_FOOTER) except HTTPError: print("Error converting file", mf) except URLError: print("Server doesn’t like us :(", mf) sleep(10*random()) # add a random wait, so the api server doesn’t kick us sleep(3*random())
⚙ Babcom is trying to load the comments ⚙
This textbox will disappear when the comments have been loaded.
Note: To make a comment which isn’t a reply visible to others here, include a link to this site somewhere in the text of your comment. It will then show up here. To ensure that I get notified of your comment, also include my Sone-ID.
Link to this site and my Sone ID:
This spam-resistant comment-field is made with babcom.
The European Copyright directive threatens online communication in Europe.
But thanks to massive shared action earlier this year, the European parliament can still prevent the problems. For each of the articles there are proposals which fix them. The parliamentarians (MEPs) just have to vote for them. And since they are under massive pressure from large media companies, that went as far as defaming those who took action as fake people, the MEPs need to hear your voice to know that your are real.
If you care about the future of the Internet in the EU, please Call your MEPs.