Viki: Can we please get consistent language names across the data silos?

porkypine · April 19, 2026, 2:23am

Hello!

I have an ancient request that bugs many of us, especially segmenters, because we use several data silos while segmenting. They are a mess of inconsistency. Each silo uses a different name for many of the languages, and it drives us a bit nuts trying to guess what that name may be.

I have included images so you can see what I am talking about.

@Amy, @Camille, @Jes, @Mari, @Sean, @Val and @yusiangteo @brendas

I stuck the screen captures in this doc

I ask that Viki clean up the data silo’s language names to be consistent. Looking at the common pre-subbed languages below, the one silo that is unlike the others, is the data silo for Bulk Translations.

I really don’t care which one you decide to use. I only ask that the language names be consistent in every data silo.

Subtitlers work out of the subtitle editor. Segmenters are the main people using Bulk Translations because we need to combine every pre-subbed language when we fix timing issues and broken sentences.

The funny thing is, most of us segmenters do NOT speak the other languages. Segmenters are illiterate and can’t read the language scripts that use non-romanized lettering. So why is it that our main language selection tool - Bulk Translations- is the only language silo that uses those languages’ scripts and spellings?

Thank you for fixing this issue.

mirjam_465 · April 19, 2026, 4:25am

I nowadays add the list of languages plus the shortcuts to find them in the bulk to the Team Notes:

Without that, some of the non-Latin script languages can be very hard to find.

manganese · April 19, 2026, 4:30am

I know what you mean. Personally, I think that every language that Viki has subs for should be in the “Popular Language” list at the beginning of the list in Bulk Translations. This would make it much easier to locate them.

A show that I’m participating on soon has 16 languages and, like you, I went through many permutations before I ended up with the right ones. As a consequence, I put together the list below for future reference.

In the segment timer, as you’ve indicated, all languages can be accessed using English. Here is an example in case anyone doesn’t know what I mean by that… When I click on the language button on the bottom right corner of the segment timer and quickly type “hi” (no quotation marks), the Hindi language appears and can be accessed. Unfortunately, I need to know the Hindi language to access the language in Bulk Translations. It would be great if English could be used in Bulk Translations, too.

bubbletea_shi04 · April 19, 2026, 5:22am

Thank you @porkypine for bringing up this issue! I already talked about this in VCC and really grateful to u for bringing up this issue. Until now, if I didn’t know any language I go check the presubbed languages and translate the names, copy paste them in the bulk translator and as you can see it’s a tedious process and becomes really cumbersome if there are more than 10 languages to work on. I really hope this issue reaches Viki staff and they do SOMETHING about this problem. It’ll change so much and make things easier for us, segmenters!!

cgwm808 · April 19, 2026, 10:24am

@porkypine – “Data silos are isolated collections of data that prevent data sharing between different departments, systems and business units.” The viki staff created a DATA SILO by using two datasets to list languages.
What we want is the same DATASET used to list all the 200 languages which viki claims to have subtitles for – not just consistency - we need a unitary dataset, not a consistent “data silo”, because the silo is defined by inconsistency. There is a common orthography (English) for the dataset in the subtitle editor for “To” and the Segment Timer and in the video player for viewers. A dataset using each language’s unique orthography to write the name of each language is used for “From” in the Subtitle editor, in Bulk Translation for both “To” and “From” and in the list in “settings” in the video player.
The use of two orthographic systems creates a data silo at viki. It makes sense that native orthography is used in the video player for language selection by the viewer. But it doesn’t make sense that the native orthography dataset is used in the subtitle editor or bulk translate. The two sets of users are different. While every subber and segmenter is a consumer, only a small portion of viki consumers are subbers and segmenters. We want the English data set used in the subtitle editor, segmenter, and bulk translate so that the subbers and segmenters only have to deal with English names in choosing subbing language.

mirjam_465 · April 20, 2026, 3:20am

Viki recently brought back the reference subs. This could save us a whole lot of work… if only we could copy them! Then there would be no need to wait for someone to save the presubs.
And in case Viki is in fear of people misusing this feature, they could give exclusive copy options to segmenters.

shraddhasingh · April 20, 2026, 4:03am

Do they contain the source language? Out of all the shows I’ve been on recently, none had that.

mirjam_465 · April 20, 2026, 4:40am

No, only the presub languages.