The return of the vikibot

gwadaelle · April 10, 2020, 11:46pm

Hi everybody !

Yesterday, I received the following message from amik

Hi there!

We noticed you are an active French subber here on Viki so we would really appreciate and trust your expert insight on rating French translations. We want to ask if you would be interested in participating in a test that the Rakuten Institute of Technology is working on for their Machine Translation tool.

You will be given translations done by either a human, Google, or RIT Translate and then asked to rate the adequacy and fluency of each subtitle on a 5-point scale. This test will take approximately 15-20 minutes, and is meant to show us how accurate each translation tool is compared to each other. This is an opportunity to see how good (or bad) this machine translation service is for potential future use. If we can use this tool to help subtitle less popular shows that often get forgotten, it will allow the community to focus on perfecting the channels they love. Viewers would get the best of both worlds––more content they can watch!

If you are interested in participating in this test, please reply back by April 13 and we will send you further instructions for how to participate! If you have questions or would like clarification, I am also happy to answer them!

Take care,
Viki Community Team

I’d like to know if the persons who were in touch with viki about the vikibot have heard about this demand ? I’m very surprised to have received this message. Why me ? How did they choose the French translators ? How do they know if they are good in French ? Did other languages’s translators receive this message ?

I’m waiting for your answers and opinions. I’m not ready to help Viki on this way. And for free ?
Gwadaelle

mirjam_465 · April 11, 2020, 5:11am

I find it worrying that they even consider using AI to translate. Human translators are by definition better than any bot. However, Vikibot will improve (though never get totally perfect) from what it will learn from US. So if we cooperate with this, we’re basically feeding a machine that sooner or later is going to compete with us. And that would not only be a shame for those who enjoy subtitling here, but also for those who watch the shows. And for Viki, cause it would decrease their value for anyone who depends on the subtitles. And for languages in general, since the bad translations would sooner or later find their way into people’s minds.
And old shows can be translated by humans too. I certainly don’t have a problem with it and for beginners the old shows might be their only way in.

Btw, will Vikibot take over the segmenting as well at some point?

irmar · April 11, 2020, 6:54am

They did tell us that they would keep the bot only for abandoned channels nobody wants to work on.
There’s nothing new here.
What I’m wondering about is how they choose the people who will evaluate. In this case they’ve happened to choose you, but if they choose them randomly, or judging by the number of their contributions, they might also choose people who are not good in their language.

mirjam_465 · April 11, 2020, 7:36am

How do they even know noone wants to work on them? There might be some hidden pearls that noone knows about. And there might be some shows that someone might want to work on, but can’t since the CM or Mod is not reachable anymore.
It would be good if Viki would at least somehow make the allegedly abandoned shows known before turning to such drastic measures.

cgwm808 · April 11, 2020, 8:57am

Rather than saying in general what is wrong with the bot, this is an opportunity to be specific about the limitations of a machine translator.
It would be very interesting if the AI bot “knows” when to use tu and when to use vous. Especially to go from English which has no distinction to a Romance language. So the English from Korean doesn’t distinguish initimate and polite; but when the human subber then translates the English to French, then the intimate may be required and is an added value of a human subber.
Years ago I was asked by viki staff to review segments created by a bot and to time myself when I edited segments by the bot as compared to segmenting from raw by myself and then to write a critique of what were problems I perceived with the AI cut segments. . It was an interesting experience about five years ago and guess what – they still have not instituted an AI segmenter. You would think with the digitization of the sound that it would be an easy task but apparently it isn’t. I think making a good AI translator might be even more difficult than a good AI segmenter.

mirjam_465 · April 11, 2020, 9:05am

Wouldn’t it make more sense if the bot would translate directly from Korean/Chinese/Japanese to French?

anna79_9 · April 11, 2020, 11:09am

What I notice is that viki staffmembers decided to do this project by contacting random people from french community. It would have been great, with some professional conscience from them, to think about: why not make a public request after everything that has happened lately? Anyway, it was sure the community would find out one day. Too bad it’s still something we learn ourselves, not through the authority that is supposed to promote and allow group cohesion.

And I think Irmar is right. There is nothing to feel proud about of having been chosen for such survey, since they certainly chose randomly, without much knowledge of the qualifications of this contributor. In any case, I know one thing, they did not choose to send this message to the oldest and well known French contributors. I am sure that I will not receive this request, I do not even wonder why… What is a pity, is that by doing that, they are penalized too.

In any case, the relevance of their survey is greatly reduced, in the simple idea that: they have no sufficient capacity to be able to decide on the skills of a participant. One can also think that the number will allow their investigation a certain “average” If they can find enough people, because they seem to choose their participant. Which COMPLETELY skews a study. But these are not investigative professionals, we can see that.

That said, I’m absolutely not worry. Although technology is accelerating rapidly, one thing I am sure of is that we are very far from obtaining efficient and 100% good translation software. For the simple good reason, that a communication is based on two things: literal message + what surrounds the message. There are also other very simple reasons specific to French language: feminization/masculinization, form of a sentence… In addition, the first language should be perfectly translated in order to avoid confusion on the computer. But, you know it guys, each language has its own untranslatable singularity.

Well, why does the human speak? Langage was created for the only purpose of being able to communicate with each other, thereby creating society. On the other hand, we can see that in the current societies that we have, capitalism is slowly killing us. When I say kills us, it’s literally the case. We can clearly see what is currently going on with the pandemic. This is what happens when we give more importance to money than to human subjectivity: humans die.

And the worst thing here is that we bring this to ourselves. Humans destroy what is most important for them, compared to all animals: subjectivity, consciousness.

I watched a report a few days ago on developing countries, with incredible cultures but which are poor: these people were incredibly happy amidst poverty. They had no fridge, no store. But they had one pond for fishes. They literally lived together and not, each on its own. When I think of the fact that after 4 years in my place of residence, I still haven’t met my neighbors once… But let’s go back on the subject. I really wondered if it was better not to live in this kind of place rather than in the pseudo-technology-money-greed society that we created here. Money clearly does not make people happy. Thank you to these sincere and happy people whom I have been able to meet in these reports. True happiness is not that. I have cherished false happiness for a long time now. I think it is time for me to travel a little more, after the end of this health crisis of course!

Conclusion: Wall-E movie was our future. Fortunately, it will not be the whole humanity.

Ps: yes, I like to open topics. Because, ultimately, we realize that these subjects are broader. Like an illness, one looks at the symptoms in order to stir up the real illness. Unfortunately, today and we thank the DSM-Stupid 1million version, we treat one symptom by one symptom without taking into consideration the whole person. This causes many therapeutic problems, especially in terms of medication: one medication per symptom, each medication can self-cancel. The biggest problem: we don’t treat the disease. And we no longer treat a person, we treat a symptom. Bye bye human being.

irmar · April 11, 2020, 11:54am

It was sent to an Italian subber (and young moderator) as well, last week.

And yes, Mirjam, it would be far better if the bot translated directly from Korean (or Chinese). But, from what I’ve seen of Google translate, Korean-English or Korean-whatever European language is far worse than English-French or English-Italian. Yes, if done through English you lose the distinction of the honorifics. But if it has to translate directly from Korean, the different word order in the sentence drives the robot crazy - and I don’t blame it!

sonmachinima · April 11, 2020, 3:51pm

How is the word order in Korean? Could you give an example please? (like writing an English sentence with the order Korean would use)

irmar · April 11, 2020, 3:56pm

The Korean word order is subject - object - verb The verb always comes at the end of the sentence.

What would you do if the bot translated as follows?

One day a mythical man named Go Nan Gil lived
Because of the numerous obstacles you’ve created, it’s hard to get over them.**
But with a family that you love, everyday, sharing precious times, making precious memories…A nest of love. In the world most happy is it not?
Everyone, our junior of the accounting division, obtained from TQ Cosmetics, top secret documents brought!
Director, Director, so that you can travel on the path of righteousness, to tell you good things, that my duty isn’t?
TQ Cosmetics’ losses, profits from other companies for it are making up.
Among these bamboo, mountain and water, portrait, birds and animals, or flowers and greenery you must draw two of them and within the time limit. submit it [also note the use of it even when referring to several items]
In order to sell the fabric, if with as much effort as Han Seok Yool has put into studying it I have to study, I together with Han Seok Yool will just sell it
Like I at the Namyeong Station accident, things that I’ll regret later on I’m not going to do.
Teacher, to be completely honest, why you picked me, and why you’re trying to make me independent, and helping me and supporting me…, to you to ask is it okay?

More problems for automatic translation:
there is no “number” for a verb (Plural or singular)
there is no way to know whether the subject of the sentence is male or female
And of course: at least four degrees of honorifics:
casual, polite, formal, superformal

To help understand what each word is in the sentence, they have suffixes/markers, attached at the end of the word. A suffix for subject, object, being somewhere, going somewhere, a suffix for time, a suffix for “too”, for “with”, for “to” (ex. I give the ball to Maria), a suffix for the transportation means … you name it!

mirjam_465 · April 11, 2020, 8:25pm

Good points.

On the other hand a bot can learn and develop over time. It could get to a state where it’s not completely horrible. If I now look at Google Translate I do see some improvement. But using it for Korean-English usually gives a far better result than Korean-Dutch. Because, obviously, English is widely used, and therefore the bot got far more chance to “improve itself”.

When we translate a drama, we usually know who is who, who is male/female, etc. Because of the context. But the bot has access to that context as well. And it can scan through the whole series far quicker than we can. So it wouldn’t just translate random sentences. And the more the bot is used, the better it’s going to work.

All of that said, I still don’t believe it would ever be better than us.

cgwm808 · April 11, 2020, 10:53pm

The difficulty of a AI bot for Korean to English or any other language is not only that the structure of the sentences may be very different with Korean being Subject Object Predicate (both verbs and adjectives) and English being Subject Verb Object but that those particles added to the ends of words which @Irmar discusses are often completely dropped. So for example one adds the particle 이/가 (depending on ending character whether it is a consonant or vowel) to the subject (noun) of the sentence but the particle might be dropped and the subject also might not be mentioned at all. Similarly, one can add a particle to a noun to make the noun an indirect object, but the particle might be dropped completely. Sometimes a Korean sentence has three nouns in sequence and it is up to the hearer to decide which part of speech each noun is as there are very few times the addition of a particle is mandatory. The possessive is indicated by the particle 의 but it’s not mandatory. Sometimes one infers the possessive by the word order as the possessor noun always precedes the noun possessed. Or there are compound verbs in which two verbs are in sequence and it is up to the subber to decide which action is/was performed first. So it is much easier to program an AI translator from English to a Romance language then to program Korean to English or a Romance language.

bozoli · April 12, 2020, 6:58am

This explains why I occasionally see a sentence in English saying something like “We need to find her and go”. They most likely mean “We need to go and find her”.

Perhaps you are right, they might be starting from the English language. But if their goal is to create a translator from any language into any language, then at some point they’re going to have to drop the English.

piranna · April 12, 2020, 8:29am

The return of superman!

This is what I thought when I read the title of your topic
I miss those guys!

how to gain this viki pass for new subbers?
Currently, there was 1 topic to complete unfinished dramas in French.
If the bot will work on incomplete dramas, I don’t know how new subtitlers will gain the new vikipass.
One of the subber I was working with, just last week we were trying to find something to complete, not too long: immediately, Viki pass wall. We couldn’t work together because of that.
We had to wait that she gained her Vikipass on another drama.
creating a favorable situation?

If new subbers don’t have this Vikipass, is it creating a favorable situation to use finally the vikibot?
Since old teams couldn’t provide subtitles and new subbers can’t help them, how are they going to complete it without fresh blood?

read some articles about translations and AI: some paid translators from working companies are editing after AI. Of course, they said that technical things are not yet translated correctly by AI.
I agree with Bozoli, I tend to think it’s going to be a mutation of our role: people who were translating would no longer really translate but edit.
I don’t know about the % of accuracy of their bot:
Imagine the AI has a 80% accuracy, the 20% left is for humans.
Logically, if an AI can do 80% of the work correctly, why would you wait for some hours for subtitles when AI can do it 80% accurately and faster in a few minutes?
The remaining work is editing the whole and retranslating the 20% the bot couldn’t do.

People are working on AI to improve it.
Alexa, Siri, face recognition, on public websites a bot is answering main questions, Google working on it, we see a lot of new projects using AI on crowdfunding websites (one project I saw was a learning device for new languages like an Alexa teacher using AI).
How much can they improve it, I don’t know, but the progress is already extraordinary.
AI doesn’t have a human’s limits like it isn’t busy or tired or needs to sleep or eat.
And you can feed it with data and data indefinitely.

There is also AI recognizing faces. If we send pixels to AI and they can recognize faces and we put as parameters that for this face, they will use formal language and for this face unformal.
Examples you give a photo of a criminal or a video for Ai to recognize him.
And voice recognition: on Netflix, there was this documentary of the murder of Gabriel, they are working with public organizations to protect children with AI by analyzing voice and words people said over the phone. They fed AI with their history of calls and improved it to have a judgement or take a decision according to words. Based on AI, they can detect when it’s case they need to protect the child or measures that could be taken and the scientist working on this project said that AI is logical and has paths that we teached it to have: where some humans could fail in assessing the criticity of a situation.

I think it’s a matter of combining different forms of AI.
And who knows how much human kind will do new discovery in AI field?

I don’t think it’s a matter of is it possible or not?
We will have mutations because of AI and we will be impacted, more about how much and how we will adapt.

But it’s clear, we don’t have a pool of French editors on Viki. So either they hire editors and pay them to improve their bot, either it won’t be for today, but volunteers hands will help improve it. The survey is in that sense.

The goal is improving the bot, more accuracy according to a context, a story. It might not be 80% accurate, but increasing from 60% to 70%, 75%, 80%…
Obviously, they need to improve it by giving it dramas with a context to translate and us to edit. So there will be dramas translated by the bot.
What is behind it, I don’t know, but we don’t have a pool of old dramas to work with once it is translated. How do they plan to continue to feed it?

irmar · April 12, 2020, 9:01am

They will feed it with the already completed translations to study and ponder on.
It won’t have any fresh work to do, but it can fine-tune its knowledge by this immense amount of data.
“Aha! So this is how you translate this tricky sentence! Hmmm… Good to know.”, the bot says to itself.
Of course it will also be fed all the crappy translations by the crappy illiterate volunteers who think they can translate when they don’t even know how to write their own language, and left there by the crappy lazy mods who hoarded projects and never bothered to edit them.

piranna · April 12, 2020, 9:21am

Yes, probably.
It’s the testing and improvement phase.

I think in a research phase, they try to test it by steps.
Like emergencies calls by bots: first feed, then test the bot with calls.
Like they test a vaccine on animals first then on real patients. The vaccine is not finally for animals, but finally for humans.
Like giving pics to a child to recognize words, then asking the child to make sentences. The finality is to make him able to communicate.
Learning to…
AI has the capacity to learn. And we humans decide the finality of our creations.
For improvement, just analyzing or reading is not enough (like reading a book about playing an instrument).
For any product, people have to put it into application.
It’s the concept of experience and scientific ideas. In a new environment or situation, with what we feed it on previous data, can it work?
For that, they need a blank canvas. It’s the ultimate test: working with a certain autonomy and accuracy starting from scratch. A translation bot finality is to translate, then it might have different applications or purposes. Like face recognition tech for security companies, but also for phones unlocking or Snapchats and it also could help the translation field.

The finality of the product on Viki, I don’t really know for sure.
I know for sure that feeding it for feeding it is not the finality.

mirjam_465 · April 12, 2020, 9:48am

irmar · April 13, 2020, 4:37pm

I just found an interesting insight by @jadecloud88 about automatic translators:

I’d ‘translate’ from Korean to English, and if the meaning doesn’t make good sense, I’d ‘translate’ from Korean to Chinese. Almost 99% of the times, the meaning translated to Chinese and then to English clicked better than if it were done directly to English.

For ex. 공항가는길 via Naver Translate:
Kor-Eng 공항가는길 -> To the airport road
(not a wholesome accurate translation although in a general sense, bearing Korean sentence structure in mind, it’s not wrong either)
Kor-Chi 공항가는길 -> 前往机场的路
(this is the perfect ‘On The Way To The Airport’ definition that the phrase means, after a second round translation from Chi-Eng)

And I had also heard someone say the same thing about doing it through Japanese.
Why? Probably these two languages, Japanese and Chinese, before the k-wave, were more popular than Korean, so there’s a much greater database.

mirjam_465 · April 13, 2020, 9:40pm

Korean to English translations usually also make more sense than Korean to Dutch translations. Indeed because there’s much more English learning material for the bot.