#12: AI and Speech
Ian and Michael discuss AI and speech technologies.
Automated Transcript
Ian Bowie
Hello and welcome to AI unfiltered with me Ian Bowie and our resident expert Michael Stormbom, where we will be talking about everything to do with AI in our modern digital society and what the future holds for all of us.
Michael Stormbom
So in today’s episode of AI unfiltered, we will be talking about AI and speech. So we could probably start off by talking about speech recognition. So that is the transformation of speech, of audio, into text.
Ian Bowie
Actually, speech recognition is probably something that most people have encountered at some time in their daily life, possibly even without recognizing it.
Michael Stormbom
Yeah no, for sure. I mean, if you use any of those digital assistants like Siri, for example, when you speak to it, of course, it’s speech recognition that transforms your speech into text. Your television might have speech recognition, so…
Ian Bowie
My television does actually and I think it’s really cool. No, I love it. I it’s I think it’s got it on the television part. But it’s obviously built in because I can go to YouTube. And then I just go to the search function and there’s a little microphone, and then it just says, Speak. And then I tell them what I’m looking for.
Michael Stormbom
Yeah, so you can do speech commands, and…
Ian Bowie
It’s so cool.
Michael Stormbom
Yeah. Not only is it cool, but of course, from an accessibility perspective, it’s of course, well, of course it can be quite essential. Yeah.
Ian Bowie
I mean, I like it just because it’s funky and I don’t know, I don’t I don’t have to, you know, mess about and it actually saves time. But I mean, for people who are mobility impaired, it must be a godsend.
Michael Stormbom
Yeah, no, absolutely. And I think we spoke about it in a previous episode. Yeah,
Ian Bowie
absolutely. Yeah. I mean, the only concern that I have, we had a discussion about it the other day is, does that mean your television is listening to you? All the time, even when it’s switched off?
Michael Stormbom
Conceivably, yes.
Ian Bowie
Yeah, exactly. Yeah.
Michael Stormbom
So I mean, it’s it’s the same with these digital voice assistants, or your mobile phone- Yes.
Ian Bowie
Well, that’s right. Yeah. I mean, because we were talking about, well, it doesn’t sound so good. But we were actually talking about knives the other day. Because I was I was quite interested in getting a hunting knife. And then after we had that conversation with my wife, she said, I keep getting all these adverts popping up on my phone for hunting knives. Well, you know, it’s not even her who’s interested in them, but obviously, the phone or something in the phone had been registering hunting knife. And the next thing is it popping up with adverts all over the place?
Michael Stormbom
Yeah, indeed. I mean, this has happened to a lot of people and to myself as well; you’ve been talking about something and then all of a sudden, mysteriously there’s some advertisement for the very thing you were talking about today’s earlier. I read an interesting article about it the other day and like that very question. So is your phone listening to you? And well, the author’s point in the article was that, well, it’s not really necessary, because you’re so predictable that it was able to… it doesn’t even need to listen to you. So I don’t know if that was true to but I thought it was I thought it was a funny, funny angle thing.
Ian Bowie
But are we predictable? Or has social media made us predictable? I think it’s a bit of both. Yeah, yeah. Because we’ve got Siri, we’ve got Amazon Alexa. Alexa. Yes. And what’s the one from Android?
Michael Stormbom
I think it’s just Google Assistant, isn’t it.
Ian Bowie
Google Assistant work a boring name, guys think of a name. Yeah. And then Microsoft has got something
Michael Stormbom
Yeah, Cortana.
Ian Bowie
Cortana that sounds quite nice. Isn’t it? Has a nice ring to it? Yeah. Sounds like it should be made by Ford. The Ford Cortana. Yes. Yeah.
Michael Stormbom
And probably you can talk to the car as well. Talk to… drive forward.
Ian Bowie
Yeah, I reckon that’s probably going to come eventually. But you can talk to your car and certainly we’re not the car but the telephone, for example.
Michael Stormbom
Yeah. And the car can speak to you of course we have the GPS and….
Ian Bowie
Absolutely. In the low sexy voice. Yeah.
Michael Stormbom
Yeah. No, but of course any type of transcription so even subtitling there’s possibly not a human being who’s doing the subtitling. So just…
Ian Bowie
Well, some of the subtitles I’ve seen you better hope it’s not a human being because I don’t think they should be in a job if they are. Yeah. So I wonder you know, we talked earlier about corpora. Yeah. Is it possible because at the moment, you know, a corpus is normally updated by researchers, human researchers. So would it be possible in the future that perhaps, algorithms could do the work of human reason researchers?
Michael Stormbom
I mean, I think that’s already the case studies are using for example, for example, just scraping online, online newspapers, for example, and just collecting data continuously. So there’s…
Ian Bowie
The problem with that is you see I mean, when you look at a corpus, and when they break down the language, and then they look at for example, you know, individual words and how common they are. They also then subcategorize that word depending on whether it was used in a blog, in a magazine or newspaper, television, etc. So I’m a little bit thinking you know, anything that’s written down, or recorded for television, radio, whatever. It’s quite easy for an algorithm to keep up with that. Yeah. But the spoken language itself.
Michael Stormbom
I think that is a trickier one because I mean, it’s one thing is, what’s being broadcast because of course, there’s there’s a certain register of language that you use in in broadcasts and so forth, not really, not the way people would speak normally. No. So I mean, and for that, you will you will need to record that for an algorithm to be able to, to analyze it you need to have… yeah, so I think that’s definitely…
Ian Bowie
Well, I mean, obviously, obviously, they are recording…
Michael Stormbom
Well, I mean, of course, everybody’s recording stuff online all the time, certainly.
Ian Bowie
Who’s doing the analyzing is you obviously got a human researchers going around and recording English as it is spoken on the streets. But then, what do they do with that data afterwards? Is that then put into some kind of AI algorithm to be analyzed? Or is it humans that then basically, transcribe that and analyze it themselves?
Michael Stormbom
Yeah. Or it’s a combination of its transcribed with AI and then as a human being who checks it, right. I mean, that’s, I mean, to advertise to the podcast website, I mean, when we do the transcriptions there, I mean, that’s, that’s entirely automated, using speech to text and then, and then we’ve go and… revised by a human voice by a human, yes indeed.
Ian Bowie
Which means you are me basically. Yeah, is it? Do you think we’ll ever get to a stage where we can take the human out of the equation?
Michael Stormbom
I think in many cases the human is already out of the equation. So I mean, certainly cases where it doesn’t need to be 100% Perfect.
Ian Bowie
But I was thinking about, for example, you know, what you just we just talked about transcribing what we’re talking about on the podcast, and then of course, we then have to go through it and make sure that it’s okay, and obviously make some edits as we go along.
Michael Stormbom
Well, I mean, I think it’s quite close to that. Will we ever get to a point where it’s like, 100%? I don’t know. Yeah, I think that I think it’s a little bit it has to depend on the use case, I would say so. I mean, for example, if I want my let’s say patient journal translated to, to my native language, then surely I would want to be 100% sure that the information there is accurate but I mean, for our podcasts for example, do we do we mind if there’s one type there.
Ian Bowie
Well, no, we don’t we don’t mind necessarily.
Michael Stormbom
Does it, does it affect us or like the the viewer or listener reader experience?
Ian Bowie
I suppose it depends who’s reading it. I mean, for example, you know, part of the idea of having transcriptions on the website from these podcasts is actually to help people who are trying to improve their own English. So if the transcription is full of errors and mistakes, that is actually going to have a negative or detrimental effect on that person’s learning progress.
Michael Stormbom
Yeah. And of course, the other thing with transcripts, because of course, well, for example, that hesitation that I that you just heard, how is that transcribed, do you repeat words and so forth? And there are those fill-in words and so forth? Of course it can. So if it’s all like verbatim, kind of transcription, it might not be not very readable either.
Ian Bowie
Yeah, I suppose it depends what you’re doing. If you’re, if you’re reading it without listening, then it’s probably a little bit more difficult to follow if it includes all the filler words and all the URLs and the arms and everything else. No, indeed. But if you’re actually reading and listening at the same time, then I think it’s a one sort of natural flow of text to follow. At least that’s the way I see it.
Michael Stormbom
Yeah, no, no, indeed. Absolutely.
Ian Bowie
Yeah. I always recommend to people you know, if you want to improve your English, of course, I keep saying to people, it’s about input input, input input, without any input that can be no output. And so I direct people to various, you know, resources on the internet. And one of them which which I’m a very big fan of is actually Ted. Now Ted also has transcriptions of the talks in multiple languages many times, and sometimes just for fun, I will turn on the English subtitles and also open up the translation. It sounds very transcription and yeah, you know, sometimes I look and there are mistakes. And I sometimes think a little bit about this you know, if you if you are a second language learner, hopefully you don’t pick up on those mistakes and start you know, using them yourself thinking, oh, that’s good English because it’s it’s there.
Michael Stormbom
Yeah, no, I think, I mean, there does need to be this sort of like, warning label this was automatically done. Yeah. Yeah.
Ian Bowie
Yeah, this was done by machine. There may be errors.
Michael Stormbom
Yeah. Yeah. No, but I mean, that’s increasingly becoming used in the… So just completely automated, like subtitles or or Yeah, I think that’s the… Yeah.
Ian Bowie
Yeah. I wonder if they’re using, you know, television subtitles that are often used here in Finland. I remember years ago when I was working for a language training company, my colleague, he kept getting phone calls, asking everything I said, Yeah, who is oh, it’s such and such and she works for this agency that do all the subtitles for all the British films that Finland imports. And she’s just asking me, you know, what does this really mean? Right and explain that to me. And I thought that was actually quite interesting. And I remember actually was quite funny because there was one particular series. I can’t remember which one it was. It was another one of these detectives, I think it was Midsomer Murders. And they had the sentence. I’m going up to the smoke. Well, it was completely mistranslated, because of course, what the translator didn’t know is the smoke, in English, colloquial speak, is is actually London, right? So I’m going up to the smoke means I’m going to London, and the translator had something like I’m, I’m going out for a smoke, in other words, you know, a cigarette. And I seem to remember that was actually one of the things that my colleague had a phone call about, and either he wasn’t concentrating or she wasn’t listening, but somehow it all got missed. Between him and her and it appearing on screen.
Michael Stormbom
I actually watched watch some YouTube videos the other day. So there was this German comedy show, and while I did, I studied German many years back so I could possibly make it through if I paid attention, but I figured I’d just give it a go and put on the like, the automated automatically generated subtitles. And then there were also the possibility of having them translated to Swedish so I decided to give that a go and it was actually, it was quite good actually. It was a it was about Hungary so Hungarian spoken in the video and, and English, and actually the translated subtitles coped quite well with that. I mean, it wasn’t 100% correct. If you weren’t, if you didn’t have the audio, on you wouldn’t be able to sort of like pick out what the mistake was, but it was good enough for likefor many cases was already good enough, a translated subtitles there on YouTube. So that worked quite well.
Ian Bowie
And that’s only going to get better.
Michael Stormbom
That’s only going to get better. Yes. Yeah. So yeah, that was very, very impressive.
Ian Bowie
So that must have been some kind of what speech to text.
Michael Stormbom
Yes. So it’s of course speech to text. So well, I don’t know how to do it in YouTube. If it’s sort of on the first day do speech to text to German and then they translate it or do they do it directly? Because you can of course do…
Ian Bowie
So this went from German into Swedish
Michael Stormbom
From German to Swedish. Yes.
Ian Bowie
Did you try it from German into English? Yeah, so
Michael Stormbom
I didn’t try German into English, but presumably that would be even better.
Ian Bowie
That’s what I was wondering. Because I mean, I would have I might be wrong, but I would have thought that most of these things start with English.
Michael Stormbom
Well, of course, YouTube is owned by Google. So there’s a lot of emphasis on English for sure. But yeah, I mean, I think it worked quite well, well enough. So I mean, that opens up, I mean, it enables you to sort of watch content in other languages.
Ian Bowie
Ted’ve got something like like ted.com. Yeah.
Michael Stormbom
Well, I mean, speech to text. I mean, it’s one thing I mean, from the same language for speaking, speaking. And so the next thing here is though it was the the translation.
Ian Bowie
Yeah, absolutely.
Michael Stormbom
Well, but yeah, that worked. worked quite well. So…
Ian Bowie
Well, wasn’t it so that actually, it’s a surprisingly small percentage of YouTube content that is actually in English. I’m sure there’s far more in other languages.
Michael Stormbom
Well, yeah, I mean, it’s a global entity that you would have of course, yeah,,
Ian Bowie
Yeah. No, you fall into that idea that everything’s in English. But it’s not.
Michael Stormbom
No, no, of course. I mean, if you only search for stuff in English, I’m sure, then you only see the results in English or property to get to get the illusion that there’s only English content in there. Yeah, for sure.
Ian Bowie
Yeah. I’m sure we are under that illusion. Well, yeah, I was just thinking, you know, you know, they’ve got these like in the European Parliament and the United Nations and places like that. They have these simultaneous interpreters. So they’re listening, you know that they’ve got little headphones on.
Michael Stormbom
Yeah, I wonder how much longer they will be needed.
Ian Bowie
Well, this is what I was. I was kind of wondering if… Yeah, but that that that will be like a speech to speech. But surely.
Michael Stormbom
Well, I mean, those sort of, you have that, all of the components already exist, and there are already…. For example, in Espo, they actually made a trial with this. I think it was in the welfare, health, healthcare and welfare, so for immigrants who not necessarily speak Finnish or or Swedish. So you have you have a mobile app, and you can speak into that and then it will translate it and then speak in your language. So they did a trial about that. And are those sort of devices out there as well, or you can just get apps for your phone. So I mean, certainly that is coming or is here already.
Ian Bowie
Now just think about other applications with that. For example, you’ve got audiobooks….
Michael Stormbom
Audiobooks, of course I’m and then it comes down to the translation of course, if you want to know what do you mean, just yeah, just speaking to.. I mean certainly.
Ian Bowie
Yeah. Because I mean, that’s all on audiobook, isn’t it? I mean, why not? You can get a French audio book which then automatically translate itself into English.
Michael Stormbom
Ah, well, I mean, if you had a translation, yes, I mean, if you’re talking about… Well, I mean, factual books is one thing, but for example, fiction.
Ian Bowie
Yeah, but what I’m saying is there’s simultaneous interpretation, where somebody speaking French, and then you have the interpreter who’s then translating immediately, in real time into English. You know, I mean, that’s a perfect kind of area for machines. Yes. That’s a step in, isn’t it? Yeah. So that’s what I was thinking with audiobooks that you literally have the French audiobook and whoever’s reading it in French, but then it goes through the filter. So you actually hear it in English? Yeah, it’s the same principle, isn’t it?
Michael Stormbom
It’s the same principle. But again, it depends on the book. I mean, if you want to have Infinite Jest translated on the go to the French, no, I don’t think you I think you still need…. factual material is one thing, but…
Ian Bowie
it was just words. What’s the difference?
Michael Stormbom
It just words, but it’s also about the poetry of language. So I mean, yes, you could, but I think you would lose quite a bit if..
Ian Bowie
You always do anyway.
Michael Stormbom
Yes, but you could lose more or you could lose less.
Ian Bowie
Well, it’ll come.
Michael Stormbom
It’ll come I’m sure.
Of course it will. If there’s money to be made. Somebody’s gonna do it.
Yeah, I mean, no one’s stopping you from pasting the book page into Google Translate. Yeah,
Ian Bowie
I’ve tried that. And it’s fine. It works. But God it’s so hard work and time consuming. Yeah. So I’m not sure either.
Michael Stormbom
Yeah, so… So could be done for audiobooks. I don’t know if I necessarily recommend it, but it can be done. And so what we could do, we have this podcast and it will be automatically translated to some other language and then we can reach a whole new audience.
Ian Bowie
Absolutely.
Transcribed by https://otter.ai