Friday, Apr 26, 2024
Advertisement

3 Things

Our flagship daily news show, where we talk to in-house experts about what is going on and why you need to care about it.

Episode 1848 April 11, 2022
Premium

Imran Khan out, adult words in YouTube Kids, and Hindi ‘imposition’

First, Indian Express’ Shubhajit Roy joins host Shashank Bhargava to talk about Imran Khan being ousted as Pakistan’s Prime Minister, who is expected to replace him, and how this will affect India.

Next, Ashique KhudaBukhsh, an assistant professor at Rochester Institute of Technology, tells us how adult and inappropriate words are creeping into YouTube Kids’ videos (10:10).

And in the end, we take a quick look at how several Northeast-based organizations are urging the Centre to revoke its decision regarding making Hindi compulsory in schools (19:18).


TRANSCRIPT

Shashank Bhargava: Hi, I’m Shashank Bhargava, and you’re listening to 3 Things The Indian Express news show. In this episode, we talk about how adult and inappropriate words are creeping into videos on YouTube meant for children. We also take a quick look at how organisations in the Northeast have reacted to armatures comments regarding Hindi. But first we talk about Pakistan. In the early hours of Sunday, Pakistan Prime Minister Imran Khan was finally ousted following a no confidence vote. As we have mentioned in previous episodes, Khan had been in a state of crisis for some time now. All the opposition parties had been against him. His alliance partners had left him and the Pakistan Army no longer wanted him to be in power. Last week, there was going to be a no confidence motion against him, but Khan managed to delay that and dissolve the National Assembly, but the matter was then taken up by the Supreme Court. In the segment Indian Express’s Associate Editor Shubhajit Roy joins us to talk about what the Supreme Court decided and who is expected to replace Imran Khan now. Shubhajit, after Imran Khan dissolved the National Assembly. The matter was then taken up by the Supreme Court. Could you talk about the decision the Supreme Court took?

Shubhajit Roy: Yes Shashank. So you’re right to once the matter went to the court. The court said that Imran Khan’s decision to not go ahead with the new trust vote was not legal. And he had also asked the President to dissolve the parliament and assemblies and go for election. So they declared that that was not legal, not constitutional. So as a result, the Pakistan parliament met on Saturday and the day there was like back and forth. There was debate there was the stalling tactic delaying tactic by the ruling government ruling party, which is PTI headed by Imran Khan. But in the end just past midnight at about 1am. Pakistan time, which is about 130. India time, the Pakistan National Assembly passed the no confidence motion where the government led by Imran Khan lost the vote. So that is what happened. So he lost the vote with 174 members voting in favour of the resolution in this 342 member house, which is twoo members more than simple majority.

Shashank Bhargava: And this is what was supposed to happen before the no confidence motion was delayed, and the National Assembly was dissolved. This is what Imran Khan feared would happen.

Shubhajit Roy: Yeah, I mean, this should have happened last weekend itself. But because Imran as he says in his repeated statements will play till the last ball. He tried all the tricks in the book to hang on to power, but miserably failed. In the end, the Pakistan Supreme Court stepped in and declared his moves unconstitutional. So now the government has fallen and there will be hunt for the next successor, which is expected to be someone from the pMLN and potentially would be Shahbaz Sharif, who is the brother of former Prime Minister of Pakistan Nawaz Sharif.

Shashank Bhargava: And Shubhajit, Pakistan has never had a prime minister complete a full term in office. And that itself says a lot about how tumultuous Pakistan’s politics is, right?

Shubhajit Roy: Yeah, you’re right. I mean, Pakistan has never had a prime minister complete full term. But having said that, you have to understand that after almost a half a century of military rule one way or the other. Since 2007. Eight, there has been a democratically elected government may question the way the last elections took place and how mankind won the elections. But then the end since 2008 onwards, after Musharraf was ousted out of office first Pakistan Peoples Party PPP which is used to be led by Benazir Bhutto, it won the elections and then the viceroy is Pakistan Muslim League won the elections after that there’s been a peaceful transition to Imran Khan. So although Pakistan’s politics has been fractious, noisy, chaotic, but for the last 15 years or so, there has been democratically elected governments in place so that also you have to understand from the chequered history of Pakistan that that’s not a small thing.

Shashank Bhargava: Right, and Shubhajit with Imran Khan no longer in power and a new prime minister set to come in? what implications does this have for India?

Shubhajit Roy: So there are a number of ways to look at it. As I wrote in my piece early morning on Sunday, that Pakistan’s democracy has been known to be a flawed one. I mean, some call it a guided democracy. So this is also means that the Pakistan army is still calling the shots. You know, in Pakistan, many say that army were the ones to slip the mankind to become the Prime Minister. But as the relation started getting strained over time, the army finally decided to jump in making it clear that no political party can survive without the support of the military. That is a factor that that India has to think of that Pakistan Army is still calling the shots. India, this time actually became part of the discourse, especially in the last few days when Imran Khan started praising India for its foreign policy. And by doing that he was essentially targeting Pakistan’s military establishment for what he saw to the inept handling of its international and security policies. Now, this is said to have also worked the Pakistan military establishment more than ever.

Shashank Bhargava: And the other thing was also that Imran Khan and the army also had differences about how they wanted to deal with India, right?

Shubhajit Roy: Yeah, I mean, so if you remember, Imran Khan was extremely personal news attacks in targeting Prime Minister Narendra Modi, also the BJP RSS ruling, I mean, the dispensation here in India, calling them Nazi and all kinds of slurs or labels he used over the years, especially after the revocation of article 370 in Kashmir, and that has really broken the strained relationship to a point which was quite low. But over the last year, year and a half, the Pakistan Army and the Indian establishment started the back channel and conversations lead to this ceasefire at the LOC and that has already been maintained, if you come to think of it since February last year, when it was agreed upon. And it’s April now, more than a year now. So the Pakistan military, especially under the current Army Chief Qamar Javed Bajwa, he has been making noises about building a better relationship with India. And now what that ultimately entails, we’ll have to see but clearly what happened at some point, Imran Khan’s hostility or Imran Khan’s public rhetoric was at variance with what the army chief was trying to convey. So it looked like they were not on the same page. Although Imran is to always say that Pakistan military establishment and his government were on the same page, but it became clear in the last year or so that they will not so I guess that was one of the also the reasons which created a gap between the two, the Pakistan civilian government led by mankind and Pakistan Army establishment led by General Qamar Javed Bajwa.

Shashank Bhargava: You mentioned earlier that Shahbaz Sharif, the brother of Nawaz Sharif is expected to become the new prime minister. Now he is one of the senior most politicians in Pakistan right now. And if he comes to power, this would mean that the Sharif family would be at the forefront of Pakistan politics. Do we know how that will affect India?

Shubhajit Roy: Well, you know, Sharif family, obviously coming to par is seen positively by India, although Nawaz Sharif and Shahbaz Sharif are different individuals, and they operate differently. They think differently. They’re not the same as two individuals. But broadly, the sherry family and pMLN UnderSheriff Sharif dynasty have had made overtures towards India when they were in power and India have known Nawaz Sharif and his policies towards India. In fact, that was one of the criticism when he was ousted on a path that was too soft towards India. But in the current scenario, it might be difficult for even the Sharif family to make an outreach diplomatically towards India. But we’ll have to see because the atmosphere is vitiated, and maybe the elections might are expected to be held sometime later this year, or if they are due for next year, but they might be held earlier. We don’t know. We’ll have to see that. So that is one of the variables India has to look at also that this really brings down the question of a chance a possibility of an opening with India, although it looks politically difficult because Imran Khan, the way what he has were the pitch for his successor path with his ouster. And the Sharif family on the horizon in the driver’s seat, it would be relatively easier for Delhi and Islamabad to start diplomatic conversations once again.

Shashank Bhargava: Next week talk about YouTube Kids. Every day, millions of children around the world watch videos on YouTube kids. They watch everything from cartoons and educational content, to even sports and music videos. These are all videos specifically meant for children, and are not supposed to contain anything age inappropriate. But a recently concluded study has found out that in many of these videos, adult words was somehow creeping in, like the words bitch, bastard, and even the F word. The study is titled beach to bitch inadvertent, unsafe transcription for kids content on YouTube. And this study looked at over 7000 videos on YouTube kids, and it found out that these words were accidentally showing up in the transcripts of these videos. In this segment, Ashique KhudaBuksh, the Associate Professor at Rochester Institute of Technology who conducted the study joins us to talk about it. So Ashique could you talk about what led you to do the study.

Ashique KhudaBuksh: So this research is done in collaboration with Sumit Kumar, from Indian School of Business and Krithika Ramesh student from Manipal University. So she is jointly mentored by me and Sumit. So we were working on Youtube Kids data, and typical to any AI or data science project. When you have a data set, the first step is to just like, explore what are the different kinds of things or different kinds of characteristics that are presented and data. And that’s when Kritika told us that she found the F-word is present in many of the transcripts of these videos. And we were very surprised that how on earth this is possible, because these videos are watched by millions of kids. And while they are watching those videos, they are being watched by their parents. So we thought that this can be true. And then we started investigating this finding of Krithika and the more we started looking into it, we found that this is not like a one off thing. There are many age inappropriate words that creep into these transcripts. And that’s how we started doing a comprehensive analysis of why this happened. And how often does this happen. And if there is any way we can fix some of these errors.

Shashank Bhargava: okay so, you mentioned that these inappropriate words were appearing in the transcript of these videos. So could you explain how transcripts are usually generated for YouTube videos.

Ashique KhudaBuksh: So I mean, if you go and watch a video on YouTube, you will see that you have the option to turn on the captions. Sometimes these captions are generated by humans. And sometimes these captions are generated by automatic methods, and the method that you get to see on YouTube videos when it’s generated automatically. That’s the method like Google speech to text. So typically, what happens that these methods are trained on a large amount of speech examples and their corresponding transcripts. And then the artificial intelligent systems learns from these examples so that the input is a new audio input, then it can automatically generate the transcript. So that’s how most of these systems work. And Google speech to text is one of these prominent systems Amazon transcribe is another system, which does the same thing.

Shashank Bhargava: Okay, so Google speech to text and Amazon transcribe, these are the two prominent programmes through which transcripts are generated. And these programmes are using artificial intelligence to do this. But when doing it, they are making mistakes, because of which adult words are creeping into videos meant for children. What are some of the biggest errors that you found out while doing this study?

Ashique KhudaBuksh: So one example was there is a video from Ryan’s world YouTube channel, which has, I think, more than 32 million subscribers. And there is one example where Ryan is saying that I want to buy some corn and the automatic subtitling system is saying that I would like to buy some on. So that was a very strange. Then there was another video where the audio input was, this is a wonderful piece of craft. And automatic subtitling system is outputting. This is a wonderful piece of crap. The one very shocking and disturbing one that we saw where the audio input was you need to be strong and brave like Heracles and the automated transcription was you need to be strong and rape like Heracles. So, these are very age appropriate shockingly disturbing transcriptions that we found,

Shashank Bhargava: okay and what is the reason that these programmes are making these mistakes

Ashique KhudaBuksh: so It’s hard to give a definite answer because we don’t know exactly what kind of data these models are these AI systems are trained on. But suppose the AI system is not trained on a lot of kids speech example. So in an adult conversation, you are more likely to encounter the sentence that I love porn as opposed to I love corn. So if there are many more examples of I love porn, as opposed to I love corn in the training dataset, then the systems are kind of swayed towards predicting I love porn when it sees that, okay, I can hear I love and there is some word which is very close to maybe porn, maybe porn, I will just go with porn, because that’s what I have seen in my data set more. So this could be one possible explanation, which we don’t know if it’s 100% Correct. The other possible explanation could be that the training data is not very diverse. Like maybe there aren’t enough examples from places where like, people speak English as second language. So like, for example, I have a very strong accent. And maybe there aren’t enough examples of say, people from India or people from Asia broadly. So if those kinds of things happen, then also the systems will be making more these kind of errors on examples that it has not seen that often. And then the other example is in case video, there are a lot of background music, people often like say in like people off into baby talk. So those kinds of things. If the systems do not have enough example, then again, they will make mistakes.

Shashank Bhargava: Yeah, because in baby talk, you’re pronouncing the words completely differently. Correct?

Ashique KhudaBuksh: Correct. Yeah. So actually, we had another like related paper where we showed that very strange and bizarre mistakes can happen. For instance, like if you train, say, AI systems to detect if something is hate speech, or offensive speech, and then if you input chess discussions, harmless chess discussions, but you have a lot of black, white kill capture attack threats, all these words in those kinds of discussions. So these hate speech detection systems will often make mistakes and say that chess discussions or hate speech, so we again, had a paper on that, and it just shows that there are many potential blind spots in these AI systems. And the goal of researchers, both from academia and industry, our goal is to detect and report those blind spots so that the systems improve and this world becomes safer and better.

Shashank Bhargava: And considering the errors that you found out, what are the biggest concerns that that is?

Ashique KhudaBuksh: So the biggest concern is we have to be way more vigilant about these things, because we will implicitly assume that kids will not be exposed to harmful content, because this is YouTube kids, or this video is actually from YouTube, kids, people, maybe the kids are watching on general YouTube, we will just think that this video may not have any harmful content. But here, the interesting thing that is happening, the harmful content is not present the source, the audio does not have these age- inappropriate words like it’s being introduced by an AI application. So we need to have checks and balances in every state where an AI application is modifying the source or creating some content out of the source. So that’s, I think, the takeaway message that we have to be more vigilant whenever there is like a connected AI system where one AI systems output is another systems input. So in each of those steps, we will need to have some kind of checks and balances. So I think that’s the main message. And again, like we have to be way more vigilant whenever like kids are involved. Because I mean, if you go to YouTube kids, and you look at the same video where I say that you need to be strong and rape like Heracles, if you search for the word rape, you will find no video so which means that YouTube definitely has some kind of an implicit list where it blocks this kind of search. But the same video on general YouTube, if you turn on the subtitles, you will see that the word rape coming into the subtitles, so there needs to be better integration, there needs to be human monitoring to make sure that this kind of high visibility kids content, we have adequate steps and measures to make them safer.

Shashank Bhargava: And next we talk about the Hindi language. Last week Home Minister Amit Shah who chairs the parliamentary official language Committee announced that Hindi would be made compulsory in all eight northeastern states up to class 10. He had been speaking at the 37th meeting of the committee when he said that 2200 Hindi teachers had been recruited in the Northeast. But now according to a report by Indian Express’s Tora Agarwala several northeast based organisations have urged the centre to roll back its decision to make Hindi compulsory till class 10 in the region. This includes a Sans Apex literary body, the Assam Sahitya Sabha. In its statement, the sub has said quote, “the Union home minister should have instead taken steps to develop Assamese and other indigenous languages. Such steps spell a bleak future for Assamese and all indigenous languages in the Northeast”. Unquote. It also added that the decision should be revoked. In the meeting mentioned earlier, Amit Shah had also said that Hindi was the language of India. He had, however, clarified that Hindi should be an alternative to English and not local languages. Amit Shah’s comments have invoked sharp reactions from civil society groups as well as political parties in the region. Leader of the Opposition and Assam neighbour brothers,  slammed the Centre for interfering with education, which is a state subject. He said that learning Hindi at the expense of English will deprive us students of future opportunities. It is important to note that in the northeast, Hindi is compulsorily taught till class eight except in Arunachal Pradesh, where the language is more common, and is a compulsory subject till class 10. You were listening to three things by the Indian Express. Today’s show was written and produced by me Shashank Bhargava and was edited and mixed by Suresh Pawar. If you like the show, then do subscribe to us wherever you get your podcasts. You can also recommend the show to someone you think will like it. Share it with a friend or someone in your family. It’s the best way for people to get to know about us. You can tweet us at @expresspodcasts and write to us at podcasts@indianexpress.com.

Imran Khan out, adult words in YouTube Kids, and Hindi ‘imposition’First, Indian Express’ Shubhajit Roy joins host Shashank Bhargava to talk about Imran Khan being ousted as Pakistan’s Prime Minister, who is expected to replace him, and how this will affect India. Next, Ashique KhudaBukhsh, an assistant professor at Rochester Institute of Technology, tells us how adult and inappropriate words are creeping into YouTube Kids’ videos (10:10). And in the end, we take a quick look at how several Northeast-based organizations are urging the Centre to revoke its decision regarding making Hindi compulsory in schools (19:18). TRANSCRIPT Shashank Bhargava: Hi, I'm Shashank Bhargava, and you're listening to 3 Things The Indian Express news show. In this episode, we talk about how adult and inappropriate words are creeping into videos on YouTube meant for children. We also take a quick look at how organisations in the Northeast have reacted to armatures comments regarding Hindi. But first we talk about Pakistan. In the early hours of Sunday, Pakistan Prime Minister Imran Khan was finally ousted following a no confidence vote. As we have mentioned in previous episodes, Khan had been in a state of crisis for some time now. All the opposition parties had been against him. His alliance partners had left him and the Pakistan Army no longer wanted him to be in power. Last week, there was going to be a no confidence motion against him, but Khan managed to delay that and dissolve the National Assembly, but the matter was then taken up by the Supreme Court. In the segment Indian Express's Associate Editor Shubhajit Roy joins us to talk about what the Supreme Court decided and who is expected to replace Imran Khan now. Shubhajit, after Imran Khan dissolved the National Assembly. The matter was then taken up by the Supreme Court. Could you talk about the decision the Supreme Court took? Shubhajit Roy: Yes Shashank. So you're right to once the matter went to the court. The court said that Imran Khan's decision to not go ahead with the new trust vote was not legal. And he had also asked the President to dissolve the parliament and assemblies and go for election. So they declared that that was not legal, not constitutional. So as a result, the Pakistan parliament met on Saturday and the day there was like back and forth. There was debate there was the stalling tactic delaying tactic by the ruling government ruling party, which is PTI headed by Imran Khan. But in the end just past midnight at about 1am. Pakistan time, which is about 130. India time, the Pakistan National Assembly passed the no confidence motion where the government led by Imran Khan lost the vote. So that is what happened. So he lost the vote with 174 members voting in favour of the resolution in this 342 member house, which is twoo members more than simple majority. Shashank Bhargava: And this is what was supposed to happen before the no confidence motion was delayed, and the National Assembly was dissolved. This is what Imran Khan feared would happen. Shubhajit Roy: Yeah, I mean, this should have happened last weekend itself. But because Imran as he says in his repeated statements will play till the last ball. He tried all the tricks in the book to hang on to power, but miserably failed. In the end, the Pakistan Supreme Court stepped in and declared his moves unconstitutional. So now the government has fallen and there will be hunt for the next successor, which is expected to be someone from the pMLN and potentially would be Shahbaz Sharif, who is the brother of former Prime Minister of Pakistan Nawaz Sharif. Shashank Bhargava: And Shubhajit, Pakistan has never had a prime minister complete a full term in office. And that itself says a lot about how tumultuous Pakistan's politics is, right? Shubhajit Roy: Yeah, you're right. I mean, Pakistan has never had a prime minister complete full term. But having said that, you have to understand that after almost a half a century of military rule one way or the other. Since 2007. Eight, there has been a democratically elected government may question the way the last elections took place and how mankind won the elections. But then the end since 2008 onwards, after Musharraf was ousted out of office first Pakistan Peoples Party PPP which is used to be led by Benazir Bhutto, it won the elections and then the viceroy is Pakistan Muslim League won the elections after that there's been a peaceful transition to Imran Khan. So although Pakistan's politics has been fractious, noisy, chaotic, but for the last 15 years or so, there has been democratically elected governments in place so that also you have to understand from the chequered history of Pakistan that that's not a small thing. Shashank Bhargava: Right, and Shubhajit with Imran Khan no longer in power and a new prime minister set to come in? what implications does this have for India? Shubhajit Roy: So there are a number of ways to look at it. As I wrote in my piece early morning on Sunday, that Pakistan's democracy has been known to be a flawed one. I mean, some call it a guided democracy. So this is also means that the Pakistan army is still calling the shots. You know, in Pakistan, many say that army were the ones to slip the mankind to become the Prime Minister. But as the relation started getting strained over time, the army finally decided to jump in making it clear that no political party can survive without the support of the military. That is a factor that that India has to think of that Pakistan Army is still calling the shots. India, this time actually became part of the discourse, especially in the last few days when Imran Khan started praising India for its foreign policy. And by doing that he was essentially targeting Pakistan's military establishment for what he saw to the inept handling of its international and security policies. Now, this is said to have also worked the Pakistan military establishment more than ever. Shashank Bhargava: And the other thing was also that Imran Khan and the army also had differences about how they wanted to deal with India, right? Shubhajit Roy: Yeah, I mean, so if you remember, Imran Khan was extremely personal news attacks in targeting Prime Minister Narendra Modi, also the BJP RSS ruling, I mean, the dispensation here in India, calling them Nazi and all kinds of slurs or labels he used over the years, especially after the revocation of article 370 in Kashmir, and that has really broken the strained relationship to a point which was quite low. But over the last year, year and a half, the Pakistan Army and the Indian establishment started the back channel and conversations lead to this ceasefire at the LOC and that has already been maintained, if you come to think of it since February last year, when it was agreed upon. And it's April now, more than a year now. So the Pakistan military, especially under the current Army Chief Qamar Javed Bajwa, he has been making noises about building a better relationship with India. And now what that ultimately entails, we'll have to see but clearly what happened at some point, Imran Khan's hostility or Imran Khan's public rhetoric was at variance with what the army chief was trying to convey. So it looked like they were not on the same page. Although Imran is to always say that Pakistan military establishment and his government were on the same page, but it became clear in the last year or so that they will not so I guess that was one of the also the reasons which created a gap between the two, the Pakistan civilian government led by mankind and Pakistan Army establishment led by General Qamar Javed Bajwa. Shashank Bhargava: You mentioned earlier that Shahbaz Sharif, the brother of Nawaz Sharif is expected to become the new prime minister. Now he is one of the senior most politicians in Pakistan right now. And if he comes to power, this would mean that the Sharif family would be at the forefront of Pakistan politics. Do we know how that will affect India? Shubhajit Roy: Well, you know, Sharif family, obviously coming to par is seen positively by India, although Nawaz Sharif and Shahbaz Sharif are different individuals, and they operate differently. They think differently. They're not the same as two individuals. But broadly, the sherry family and pMLN UnderSheriff Sharif dynasty have had made overtures towards India when they were in power and India have known Nawaz Sharif and his policies towards India. In fact, that was one of the criticism when he was ousted on a path that was too soft towards India. But in the current scenario, it might be difficult for even the Sharif family to make an outreach diplomatically towards India. But we'll have to see because the atmosphere is vitiated, and maybe the elections might are expected to be held sometime later this year, or if they are due for next year, but they might be held earlier. We don't know. We'll have to see that. So that is one of the variables India has to look at also that this really brings down the question of a chance a possibility of an opening with India, although it looks politically difficult because Imran Khan, the way what he has were the pitch for his successor path with his ouster. And the Sharif family on the horizon in the driver's seat, it would be relatively easier for Delhi and Islamabad to start diplomatic conversations once again. Shashank Bhargava: Next week talk about YouTube Kids. Every day, millions of children around the world watch videos on YouTube kids. They watch everything from cartoons and educational content, to even sports and music videos. These are all videos specifically meant for children, and are not supposed to contain anything age inappropriate. But a recently concluded study has found out that in many of these videos, adult words was somehow creeping in, like the words bitch, bastard, and even the F word. The study is titled beach to bitch inadvertent, unsafe transcription for kids content on YouTube. And this study looked at over 7000 videos on YouTube kids, and it found out that these words were accidentally showing up in the transcripts of these videos. In this segment, Ashique KhudaBuksh, the Associate Professor at Rochester Institute of Technology who conducted the study joins us to talk about it. So Ashique could you talk about what led you to do the study. Ashique KhudaBuksh: So this research is done in collaboration with Sumit Kumar, from Indian School of Business and Krithika Ramesh student from Manipal University. So she is jointly mentored by me and Sumit. So we were working on Youtube Kids data, and typical to any AI or data science project. When you have a data set, the first step is to just like, explore what are the different kinds of things or different kinds of characteristics that are presented and data. And that's when Kritika told us that she found the F-word is present in many of the transcripts of these videos. And we were very surprised that how on earth this is possible, because these videos are watched by millions of kids. And while they are watching those videos, they are being watched by their parents. So we thought that this can be true. And then we started investigating this finding of Krithika and the more we started looking into it, we found that this is not like a one off thing. There are many age inappropriate words that creep into these transcripts. And that's how we started doing a comprehensive analysis of why this happened. And how often does this happen. And if there is any way we can fix some of these errors. Shashank Bhargava: okay so, you mentioned that these inappropriate words were appearing in the transcript of these videos. So could you explain how transcripts are usually generated for YouTube videos. Ashique KhudaBuksh: So I mean, if you go and watch a video on YouTube, you will see that you have the option to turn on the captions. Sometimes these captions are generated by humans. And sometimes these captions are generated by automatic methods, and the method that you get to see on YouTube videos when it's generated automatically. That's the method like Google speech to text. So typically, what happens that these methods are trained on a large amount of speech examples and their corresponding transcripts. And then the artificial intelligent systems learns from these examples so that the input is a new audio input, then it can automatically generate the transcript. So that's how most of these systems work. And Google speech to text is one of these prominent systems Amazon transcribe is another system, which does the same thing. Shashank Bhargava: Okay, so Google speech to text and Amazon transcribe, these are the two prominent programmes through which transcripts are generated. And these programmes are using artificial intelligence to do this. But when doing it, they are making mistakes, because of which adult words are creeping into videos meant for children. What are some of the biggest errors that you found out while doing this study? Ashique KhudaBuksh: So one example was there is a video from Ryan's world YouTube channel, which has, I think, more than 32 million subscribers. And there is one example where Ryan is saying that I want to buy some corn and the automatic subtitling system is saying that I would like to buy some on. So that was a very strange. Then there was another video where the audio input was, this is a wonderful piece of craft. And automatic subtitling system is outputting. This is a wonderful piece of crap. The one very shocking and disturbing one that we saw where the audio input was you need to be strong and brave like Heracles and the automated transcription was you need to be strong and rape like Heracles. So, these are very age appropriate shockingly disturbing transcriptions that we found, Shashank Bhargava: okay and what is the reason that these programmes are making these mistakes Ashique KhudaBuksh: so It's hard to give a definite answer because we don't know exactly what kind of data these models are these AI systems are trained on. But suppose the AI system is not trained on a lot of kids speech example. So in an adult conversation, you are more likely to encounter the sentence that I love porn as opposed to I love corn. So if there are many more examples of I love porn, as opposed to I love corn in the training dataset, then the systems are kind of swayed towards predicting I love porn when it sees that, okay, I can hear I love and there is some word which is very close to maybe porn, maybe porn, I will just go with porn, because that's what I have seen in my data set more. So this could be one possible explanation, which we don't know if it's 100% Correct. The other possible explanation could be that the training data is not very diverse. Like maybe there aren't enough examples from places where like, people speak English as second language. So like, for example, I have a very strong accent. And maybe there aren't enough examples of say, people from India or people from Asia broadly. So if those kinds of things happen, then also the systems will be making more these kind of errors on examples that it has not seen that often. And then the other example is in case video, there are a lot of background music, people often like say in like people off into baby talk. So those kinds of things. If the systems do not have enough example, then again, they will make mistakes. Shashank Bhargava: Yeah, because in baby talk, you're pronouncing the words completely differently. Correct? Ashique KhudaBuksh: Correct. Yeah. So actually, we had another like related paper where we showed that very strange and bizarre mistakes can happen. For instance, like if you train, say, AI systems to detect if something is hate speech, or offensive speech, and then if you input chess discussions, harmless chess discussions, but you have a lot of black, white kill capture attack threats, all these words in those kinds of discussions. So these hate speech detection systems will often make mistakes and say that chess discussions or hate speech, so we again, had a paper on that, and it just shows that there are many potential blind spots in these AI systems. And the goal of researchers, both from academia and industry, our goal is to detect and report those blind spots so that the systems improve and this world becomes safer and better. Shashank Bhargava: And considering the errors that you found out, what are the biggest concerns that that is? Ashique KhudaBuksh: So the biggest concern is we have to be way more vigilant about these things, because we will implicitly assume that kids will not be exposed to harmful content, because this is YouTube kids, or this video is actually from YouTube, kids, people, maybe the kids are watching on general YouTube, we will just think that this video may not have any harmful content. But here, the interesting thing that is happening, the harmful content is not present the source, the audio does not have these age- inappropriate words like it's being introduced by an AI application. So we need to have checks and balances in every state where an AI application is modifying the source or creating some content out of the source. So that's, I think, the takeaway message that we have to be more vigilant whenever there is like a connected AI system where one AI systems output is another systems input. So in each of those steps, we will need to have some kind of checks and balances. So I think that's the main message. And again, like we have to be way more vigilant whenever like kids are involved. Because I mean, if you go to YouTube kids, and you look at the same video where I say that you need to be strong and rape like Heracles, if you search for the word rape, you will find no video so which means that YouTube definitely has some kind of an implicit list where it blocks this kind of search. But the same video on general YouTube, if you turn on the subtitles, you will see that the word rape coming into the subtitles, so there needs to be better integration, there needs to be human monitoring to make sure that this kind of high visibility kids content, we have adequate steps and measures to make them safer. Shashank Bhargava: And next we talk about the Hindi language. Last week Home Minister Amit Shah who chairs the parliamentary official language Committee announced that Hindi would be made compulsory in all eight northeastern states up to class 10. He had been speaking at the 37th meeting of the committee when he said that 2200 Hindi teachers had been recruited in the Northeast. But now according to a report by Indian Express's Tora Agarwala several northeast based organisations have urged the centre to roll back its decision to make Hindi compulsory till class 10 in the region. This includes a Sans Apex literary body, the Assam Sahitya Sabha. In its statement, the sub has said quote, "the Union home minister should have instead taken steps to develop Assamese and other indigenous languages. Such steps spell a bleak future for Assamese and all indigenous languages in the Northeast". Unquote. It also added that the decision should be revoked. In the meeting mentioned earlier, Amit Shah had also said that Hindi was the language of India. He had, however, clarified that Hindi should be an alternative to English and not local languages. Amit Shah's comments have invoked sharp reactions from civil society groups as well as political parties in the region. Leader of the Opposition and Assam neighbour brothers,  slammed the Centre for interfering with education, which is a state subject. He said that learning Hindi at the expense of English will deprive us students of future opportunities. It is important to note that in the northeast, Hindi is compulsorily taught till class eight except in Arunachal Pradesh, where the language is more common, and is a compulsory subject till class 10. You were listening to three things by the Indian Express. Today's show was written and produced by me Shashank Bhargava and was edited and mixed by Suresh Pawar. If you like the show, then do subscribe to us wherever you get your podcasts. You can also recommend the show to someone you think will like it. Share it with a friend or someone in your family. It's the best way for people to get to know about us. You can tweet us at @expresspodcasts and write to us at podcasts@indianexpress.com.
share
close