Mar 04 2012
I run a wiki for CURATEcamp, using Mediawiki. I don’t run it well, so it got full of spam. I learned how to add a little math script to each page edit, and that slowed down the spam for a while, but it’s easy to hack and the spam started flowing again. So now I have 700+ pages of spam and more coming in every day. So I have 3 problems to solve:
- Stop the addition of new users without confirmation
- Stop new spam
- Clean up all the spam pages
Next, I found the page Preventing access and followed the instructions to add these lines to the LocalSettings.php file:
# Disable anonymous editing $wgGroupPermissions['*']['edit'] = false;
That stopped the random adding of new spam.
Next, I started looking for easy clean up tools, and didn’t really find any. I could list all of the pages on the wiki, but I’d have to visit each one and delete it – a real pain for 700+ pages. I also had about 20 pages that I wanted to keep. I found a DeleteBatch extension that would allow me to put the spam page names into a text box (or text file) and delete them all at once.
Now I needed to generate a list of spam page names, so I went to the Special Page that lists All pages, and cut and pasted those into an Excel spreadsheet. It was a bit of a pain because the list was in three columns, and split into three pages, but I just dragged and dropped the list around in Excel until I had it all as one column. Most of the spam pages are user pages, and the titles of the pages end in a number. So I set up a second column that chopped the last 2 characters from the page title:
then had a third column which was a conditional that repeated the page title if it ended in a number. I bet I could have made it simpler with some function that converts a cell made up of a word and a number, like “ClardyGarces959″ into just “959” but I couldn’t remember how to do that.
Next, I sorted by this column, which grouped all of the page titles that ended in a number. I visually inspected the list, and I’m glad I did because some of my legitimate pages also ended in numbers. I deleted those from the list, then pasted the list of known spam page titles into DeleteBatch.
This left me with a handful of spam pages that I had to pick through individually, but way fewer than before.
Hope this helps someone else with the same problem!
Make sure to look for pages in spaces other than Main. I found a bunch more User: pages full of spam, and uses the same methods as above to quickly get rid of them.