Florence, World Health Organization
Talk to her here!
March 2020, the World Health Organization and the rest of the world recognized the threat that was COVID-19. This impacted WHO’s ability to conduct their face-to-face tobacco cessation consulting sessions, which they offer free-of-charge around the world. WHO came to SoulMachines, to create a digital health worker who can do this virtually.
In 7 weeks, I took hundreds of documents provided by the WHO including, tobacco cessation intervention methodologies, videos and transcripts of actual tobacco cessation intervention sessions, and more, created a conversational persona that would be a good fit to carry out the role, designed, tested, and deployed the agent. Meanwhile, I made sure that the conversational turns were language agnostic, allowing rapid localization.
One of the more interesting parts of this project, was getting rapid legal and medical expert clearance on the content from the WHO side. In order to facilitate this, we recognized a need to take our non-linear conversation design, and convert them into linear documents. Doctors and lawyers are experts at efficiently parsing, consuming, and approving linear documents. So we adapted our entire corpus to a linear format, get notes, clearance, and sign-off on that document, and reflected all the changes in the non-linear design. Through this initiative, we were able to get sign off from WHO, in less than 48 hours.
Google Assistant Japan, Google
In January of 2017, Google had already released the American English version of the Google Assistant as well as their first Google Home product. Google wanted to release the Assistant in multiple locales, including Japan, and this was where I came in.
In order for the Google Assistant to be successfully accepted into Japanese culture and daily user journeys, the Google Assistant needed to feel quintessentially “Japanese”. It couldn’t feel like an American product that was translated, it had to feel like an authentically Japanese version of the product. For this to happen, we needed to first understand the social positioning of the Google Assistant in the United States, within the context of its use cases, to figure out the Japanese equivalent.
The challenge with Japanese culture is that the language dynamically changes depending on the context of the conversation, especially in regards to the social hierarchy of the conversation participants. For example, a 5 year old boy would speak differently to a 75 year old man than the conversation going the other way. To be fair, this is true of any language to a certain extent, but in Japanese, these changes are especially exaggerated in relation to American English. (e.g. An older man brings with him an assistant to a sales meeting. When the older man is a customer, his word choice changes. When the older man is doing the buying, the word choice changes. When the older man introduces their assistant to the other party, the way the older man introduces the assistant changes depending on all the factors above.) Not getting this right, will make the assistant feel very “un-Japanese”.
Now, one way to solve for this problem is to have the Google Assistant figure out the “air of the room” – the relative positioning of ones social hierarchy in relation to the other conversational participants. But at the time, we didn’t have the capability to recognize the age, the social function, and conversational context to dynamically shift the Google Assistant’s Japanese persona.
On top of all of this, the Google Assistant must also shift its Japanese conversational tone depending on what task it would perform. As is the same in English, a person giving a weather forecast, and a person giving a joke would speak differently. In Japanese, this is much more of a noticeable difference.
In short, we needed to come up with a Japanese Google Assistant persona, that can handle all of Google Assistant’s current and future capabilities, and could speak to a 5 year old, all the way up to a 95 year old without feeling like Google Assistant misread the social context. And another twist – the voice actor was already signed on, so we needed to make sure that persona was feasibly reproduceable utilizing that fixed voice.
In order to do so, we brainstormed literally hundreds of combinations of differing social situations and use cases and came up with a persona that worked the best for most of the situations. In the end, we came up with an early-30s, liberal studies major, female. (with pages upon pages of other biographical detail that won’t be written about here.) Ironically we came up with someone who was fairly close to the original American English assistant, but we had the receipts to reasonably assume she was a good fit for the Japanese market.
Alexa Japan, Amazon
At Amazon, the primary concern was that Alexa had the ability to answer questions about over 2 billion things in English, but not Japanese. Although my title was Knowledge Engineer, my primary role at Amazon was to convert the NLG models that translated data points into intelligible English, into intelligible Japanese while keeping the persona of the device consistent with Amazon Japan’s guidelines.
A particular problem that was interesting was the idea of a Japanese Counter Word. In English when we say that there are two dogs, we just write: “two dogs”. In Japanese, two is, “NI”, and dog is “INU”. Also, in Japanese there’s no difference between plural or singular nouns, so you might be tempted to translate “two dogs” into Japanese as, “NI INU”. But this would be wrong.
In Japanese, you need a “counter word”. And “two dogs” will translate to, “NI HIKI NO INU”. Directly translated, this means, “TWO SMALL ANIMALS OF DOGS”. “Two bottles” translates to “NI HON NO BOTORU”. Directly translated, this means, “TWO CYLINDRICAL OBJECTS OF BOTTLES”. A counter word is a part of speech that dynamically changes depending on type of what’s being counted. Small animals have its own counter word, big animals have its own counter word, and so on. And this distinction of small and big is very arbitrary.
(Fun fact, rabbits in Japan are counted using the counter for birds. This is because Buddhist monks wanted to eat rabbit, but was only allowed to eat fish and birds, so they decided, ‘hey, rabbits have floppy ears, that can be interpreted as wings!’, so they decided to call rabbits, birds.)
2 billion data points had to then be labeled with the appropriate counter word.
Thankfully, Wikidata has a pretty extensive categorization in place. So we could at the very least utilize a 3rd party source to label our data points. Unfortunately, wikidata has many holes in its data so we couldn’t just match wikidata and our data together and call it a day.
What we ended up doing was chunking up the 2 billion data points into subjects, (e.g. sports, mountains, locations, etc.) and we localized the grammar rules and labeled the counter words per subject in order of corpus popularity. (For example, people asked about the age of celebrities way more than the height of an obscure historical figure) Then we had to come up with subject specific grammar rules to fit the Japanese Alexa’s persona, to make sure that Alexa was presenting the information with the appropriate persona. (For example, Alexa can be happy-go-lucky when telling a fun fact about Japan, but Alexa should be more professional when talking about statistics of war.)
In the end of my engagement, we were able to cover over 90% of the average user’s queries, greatly increasing the capability of Japanese Alexa.