Sunday, September 26, 2010

A failed Mturk translation test

The Mturk is Amazon's platform where you can put up jobs for people to do. These tend to be things like translation or image categorisation that humans are good at but computers are not.

According to Panos Ipeirotis' research mturk workers seem to be highly educated. The majority having degrees and about 10% having postgraduate degrees. Much mturk work is boring and probably does not use the skills these well educated workers have.

A few days ago google announced sponsorship for projects they think are carrying out good work. One winner was

"The Khan Academy will receive $2 million toward funding its work on the "make educational content available online for free" theme. The academy does just that, with a library of over 1,800 videos with lessons on math, science, finance, and history.

Bill Gates is a big fan of the Khan academy "This guy is amazing," he wrote. "It is awesome how much he has done with very little in the way of resources."

I am also a huge fan of the Khan academy I think that these videos and other online education videos such as MIT's online courses have amazing potential to transform the education of millions of people.

How far could googles 2 million grant stretch? Obviously it is up to Mr Khan to decide how to use his resources but I thought it would be interesting to see if the mturk could be used to translate one of his mathematics videos. The languages with over 100 million speakers are Mandarin, Spanish, English, Hindi, Bengali, Cantonese, Arabic,Portuguese, Russian and Japanese. 10 languages for 2000 videos would be 20,000 video translations. If each video cost 100 dollars to translate that would spend the 2 million dollars google donated. If this cost can be reduced more languages could be added.

A mathematics video is not something the average person can translate. However we know a large number of turkers have degrees and many are from India where they would likely have an understanding of English and Hindi or Bengali.

I tried an experiment to see if I could get one of Khan's videos translated into Hindi using the mturk at a cost of 5 dollars. Unfortunately I failed. The first person who accepted the task dictated what Khan said into text. Which is useful but not what I was looking for. The second person posted up another video on Solving linear inequalities in English not a translation of Khan's video.

This small experiment tells me that you need to be very clear on the mturk how you ask for a task to be completed. It also says that it might be worthwhile once you find someone who understands and completes the task to encourage them to translate other videos rather than rely on the vagaries of who happens to accept your mturk task.

This experiment ignored the problem of copyright. Khan owns his videos and it is unfair for someone to come along and copy him. I was not trying to steal any glory from Mr Khan with this experiment just to see if the mturk could be used to translate his videos.

Other people have successfuly used mturk to reduce the cost of translations. 'How I reduced translation costs of 200 articles from $9000 to $46' is an interesting article on one successful usage. This tells me that the problem was more likely with my unclear instructions than with the mturk platform. You can even monitor translations taking place in the mturk here so I still think this method would be cost effective. However my simple experiment failed.


Panos Ipeirotis said...

This failure seems to be caused by lack of clarity in the instructions and/or the ability of low quality workers to participate.

I would recommend increasing the minimum acceptance rate to 98% and ask for at least 1000 completed HITs in the past.

It may also make sense to transcribe first and translate in another HIT.

red dave said...

Thanks for your comment Panos. I agree I think the right way is to break the task down into many subtasks.
1. Dictate the video to english text.
2. Translate this text to Hindi.

Coffee Lemon said...

It's spelled "rather than", with an "a".

red dave said...

Thanks Coffee I am awful for making that mistake.