0 00:00:00,000 --> 00:00:30,000 Dear viewer, these subtitles were generated by a machine via the service Trint and therefore are (very) buggy. If you are capable, please help us to create good quality subtitles: https://c3subtitles.de/talk/1654 Thanks! 1 00:00:31,530 --> 00:00:33,839 Hello, everyone, and welcome 2 00:00:33,840 --> 00:00:36,269 back to our 3 00:00:36,270 --> 00:00:37,790 cows first clean. 4 00:00:38,820 --> 00:00:41,189 Our next speaker is Hendley, 5 00:00:41,190 --> 00:00:43,469 Hoyer Hensley Hoyer 6 00:00:43,470 --> 00:00:46,079 is a researcher at the University 7 00:00:46,080 --> 00:00:48,419 of Theme and in the Institute 8 00:00:48,420 --> 00:00:51,689 of Information Management in Berlin 9 00:00:51,690 --> 00:00:53,819 and in his talk Rage 10 00:00:53,820 --> 00:00:55,889 Against the Machine Learning. 11 00:00:55,890 --> 00:00:58,049 He will explain why 12 00:00:59,550 --> 00:01:02,039 audits are a useful method 13 00:01:02,040 --> 00:01:04,738 to ensure the machine learning systems 14 00:01:04,739 --> 00:01:06,839 operate in the interest of 15 00:01:06,840 --> 00:01:08,279 the public. 16 00:01:08,280 --> 00:01:09,360 Welcome, Hendley. 17 00:01:13,770 --> 00:01:16,019 I'm Dr. Henry, and welcome to 18 00:01:16,020 --> 00:01:17,519 rage against the machine learning. 19 00:01:20,690 --> 00:01:23,179 Auditing YouTube and others 20 00:01:23,180 --> 00:01:25,429 in this talk, I will explain why audits 21 00:01:25,430 --> 00:01:27,559 are useful method to ensure that machine 22 00:01:27,560 --> 00:01:29,809 learning systems operate in the interest 23 00:01:29,810 --> 00:01:31,309 of the public. 24 00:01:31,310 --> 00:01:34,099 My goal is to empower civic hackers, 25 00:01:34,100 --> 00:01:36,139 and I'm going to do that by releasing the 26 00:01:36,140 --> 00:01:38,479 scripts that I use to audit 27 00:01:38,480 --> 00:01:40,759 YouTube. And I'm also going to explain 28 00:01:40,760 --> 00:01:43,139 to you how to use these scripts. 29 00:01:43,140 --> 00:01:45,929 Why would it be interesting to 30 00:01:45,930 --> 00:01:48,119 audit YouTube or other machine 31 00:01:48,120 --> 00:01:50,249 learning based curation system? 32 00:01:50,250 --> 00:01:52,679 Well, YouTube has more than two billion 33 00:01:52,680 --> 00:01:55,019 users per month, and 70 34 00:01:55,020 --> 00:01:57,359 percent of the videos watched on YouTube 35 00:01:57,360 --> 00:01:59,009 are recommended by a machine learning 36 00:01:59,010 --> 00:02:01,059 based curation system. 37 00:02:01,060 --> 00:02:03,279 This is remarkable because every 38 00:02:03,280 --> 00:02:05,799 fourth person worldwide relies on YouTube 39 00:02:05,800 --> 00:02:08,228 as a news source, that percentage 40 00:02:08,229 --> 00:02:10,538 is even higher for younger people 41 00:02:10,539 --> 00:02:12,639 there. Every third 18 to 42 00:02:12,640 --> 00:02:14,769 24 year old consumes his or 43 00:02:14,770 --> 00:02:16,749 her news on YouTube. 44 00:02:16,750 --> 00:02:18,489 This means that YouTube's machine 45 00:02:18,490 --> 00:02:20,649 learning based system plays an important 46 00:02:20,650 --> 00:02:22,839 role in what billions of people watch and 47 00:02:22,840 --> 00:02:24,789 how they see the world. 48 00:02:24,790 --> 00:02:26,679 Why do we need machine learning in these 49 00:02:26,680 --> 00:02:28,839 systems? Well, there are eighty 50 00:02:28,840 --> 00:02:31,089 two point two years of video uploaded 51 00:02:31,090 --> 00:02:33,339 to YouTube every day, so that's 52 00:02:33,340 --> 00:02:35,469 500 hours of video uploaded 53 00:02:35,470 --> 00:02:37,689 per minute for a team of human 54 00:02:37,690 --> 00:02:39,969 experts. It would be impossible to review 55 00:02:39,970 --> 00:02:41,919 and categorize this user generated 56 00:02:41,920 --> 00:02:43,189 content. 57 00:02:43,190 --> 00:02:45,699 Now, YouTube markets its recommender 58 00:02:45,700 --> 00:02:48,459 system as a sophisticated algorithm 59 00:02:48,460 --> 00:02:50,559 to match each viewer to the videos 60 00:02:50,560 --> 00:02:53,109 than most likely to watch and enjoy. 61 00:02:53,110 --> 00:02:56,049 In this talk, I will show that popular, 62 00:02:56,050 --> 00:02:58,419 unrelated content is king. 63 00:02:58,420 --> 00:02:59,529 So who am I? 64 00:02:59,530 --> 00:03:01,329 My name is Dr. Henry Hoyer, and I'm a 65 00:03:01,330 --> 00:03:03,639 researcher at the University of Priem 66 00:03:03,640 --> 00:03:05,409 and the Institute for Information 67 00:03:05,410 --> 00:03:06,789 Management. 68 00:03:06,790 --> 00:03:08,829 And this talk is based on my doctoral 69 00:03:08,830 --> 00:03:11,079 thesis focused on auditing machine 70 00:03:11,080 --> 00:03:11,979 learning. 71 00:03:11,980 --> 00:03:13,899 And in this talk, I would explore audits 72 00:03:13,900 --> 00:03:16,479 as a way of making sense of complex 73 00:03:16,480 --> 00:03:18,879 and proprietary machine learning systems 74 00:03:18,880 --> 00:03:21,549 used by YouTube, as well as others. 75 00:03:21,550 --> 00:03:23,799 And this is based on a research project 76 00:03:23,800 --> 00:03:26,109 I conducted together with Hendley Cool, 77 00:03:26,110 --> 00:03:28,209 and we applied to anti-amnesty 78 00:03:28,210 --> 00:03:29,210 harvest. 79 00:03:30,480 --> 00:03:32,129 And this talk is based on my doctoral 80 00:03:32,130 --> 00:03:34,349 thesis called uses and machine learning 81 00:03:34,350 --> 00:03:36,869 based curation systems and 82 00:03:36,870 --> 00:03:38,579 using the link, you can download the 83 00:03:38,580 --> 00:03:40,889 thesis for free at the library 84 00:03:40,890 --> 00:03:43,769 of the University of Bremen, 85 00:03:43,770 --> 00:03:45,599 as many of you may know. 86 00:03:45,600 --> 00:03:47,849 But she noted based curation systems a 87 00:03:47,850 --> 00:03:50,099 special type of artificial intelligence. 88 00:03:51,120 --> 00:03:52,829 The definition of AI, according to 89 00:03:52,830 --> 00:03:55,109 Hansen, is that it's an umbrella 90 00:03:55,110 --> 00:03:57,119 term for computer systems that are, 91 00:03:57,120 --> 00:03:59,429 quote, able to perform tasks 92 00:03:59,430 --> 00:04:02,429 normally requiring human intelligence, 93 00:04:02,430 --> 00:04:05,099 and I prefer the term machine learning 94 00:04:05,100 --> 00:04:07,019 because it's a bit more precise. 95 00:04:07,020 --> 00:04:09,389 And also because many of the successes in 96 00:04:09,390 --> 00:04:11,789 AI that we've seen in the recent years 97 00:04:11,790 --> 00:04:13,709 have been obtained through what is called 98 00:04:13,710 --> 00:04:15,749 statistical machine learning. 99 00:04:15,750 --> 00:04:17,549 You might even have heard about the term 100 00:04:17,550 --> 00:04:19,409 deep learning, which is an even smaller 101 00:04:19,410 --> 00:04:20,729 subset. 102 00:04:20,730 --> 00:04:22,799 So what am I referring to when I say 103 00:04:22,800 --> 00:04:24,929 machine learning at a certain 104 00:04:24,930 --> 00:04:27,089 kind of artificial intelligence that 105 00:04:27,090 --> 00:04:29,459 infers decisions from data 106 00:04:29,460 --> 00:04:31,709 formerly Mitchell define machine learning 107 00:04:31,710 --> 00:04:33,809 as follows A computer 108 00:04:33,810 --> 00:04:37,019 program is set to learn from experience 109 00:04:37,020 --> 00:04:39,239 with respect to some class of tasks 110 00:04:39,240 --> 00:04:41,369 and performance measures if 111 00:04:41,370 --> 00:04:43,859 it's performance at past A. 112 00:04:43,860 --> 00:04:46,049 as measured by P improves with 113 00:04:46,050 --> 00:04:47,050 experience. 114 00:04:48,730 --> 00:04:50,889 And machine learning enabled many of 115 00:04:50,890 --> 00:04:52,749 the recent advances in artificial 116 00:04:52,750 --> 00:04:55,209 intelligence, it is used to 117 00:04:55,210 --> 00:04:57,459 recognize handwritten digits 118 00:04:57,460 --> 00:04:59,199 to recognize people and objects and 119 00:04:59,200 --> 00:05:01,299 images to translate 120 00:05:01,300 --> 00:05:03,369 from one language to another to 121 00:05:03,370 --> 00:05:05,029 drive cars. 122 00:05:05,030 --> 00:05:07,989 And that's the focus here to recommend 123 00:05:07,990 --> 00:05:10,119 postings, photos and videos on 124 00:05:10,120 --> 00:05:13,519 platforms like Facebook and YouTube. 125 00:05:13,520 --> 00:05:15,859 And in my research, I focus on 126 00:05:15,860 --> 00:05:18,019 Facebook and YouTube because they are 127 00:05:18,020 --> 00:05:19,999 two of the most visited websites 128 00:05:20,000 --> 00:05:21,000 worldwide. 129 00:05:22,120 --> 00:05:24,069 And they use a metal based system to 130 00:05:24,070 --> 00:05:26,260 create the content for billions of users. 131 00:05:29,970 --> 00:05:32,099 Now, recommender systems like ones 132 00:05:32,100 --> 00:05:34,589 you find on Facebook and YouTube 133 00:05:34,590 --> 00:05:37,079 have a long history, famous early 134 00:05:37,080 --> 00:05:40,529 examples include Loon from 1958, 135 00:05:40,530 --> 00:05:42,929 the information lands by Malone at all 136 00:05:42,930 --> 00:05:45,029 the tapes for iMessage Ring System by 137 00:05:45,030 --> 00:05:47,369 going back at all and group plans 138 00:05:47,370 --> 00:05:49,139 by Resnick at all. 139 00:05:49,140 --> 00:05:50,819 Looking at the research out there, you 140 00:05:50,820 --> 00:05:52,679 find that Facebook received a lot of 141 00:05:52,680 --> 00:05:54,779 attention regarding algorithmic 142 00:05:54,780 --> 00:05:57,059 awareness, user believes about 143 00:05:57,060 --> 00:05:59,429 the system, how its system 144 00:05:59,430 --> 00:06:01,859 works and the biases 145 00:06:01,860 --> 00:06:03,779 that the system infects. 146 00:06:03,780 --> 00:06:05,939 Meanwhile, especially when I started my 147 00:06:05,940 --> 00:06:08,249 research, it was comparatively little 148 00:06:08,250 --> 00:06:10,499 on YouTube. Despite its importance 149 00:06:10,500 --> 00:06:12,480 and the many people who use YouTube. 150 00:06:13,720 --> 00:06:15,579 And what motivated my research was a 151 00:06:15,580 --> 00:06:17,649 study by Islamiya from 152 00:06:17,650 --> 00:06:19,209 2015. 153 00:06:19,210 --> 00:06:21,309 They found that sixty two point five 154 00:06:21,310 --> 00:06:23,679 percent of Facebook users were not aware 155 00:06:23,680 --> 00:06:25,809 of the existence of Facebook's News Feed 156 00:06:25,810 --> 00:06:26,810 algorithm. 157 00:06:27,510 --> 00:06:29,399 They also showed that users are upset 158 00:06:29,400 --> 00:06:31,559 when posts by close friends or family 159 00:06:31,560 --> 00:06:32,999 are not shown. 160 00:06:33,000 --> 00:06:35,189 And users mistakenly believe 161 00:06:35,190 --> 00:06:37,139 that their friends intentionally chose 162 00:06:37,140 --> 00:06:38,489 not to show them this post. 163 00:06:39,970 --> 00:06:42,159 Aslam, you wrote in the extreme 164 00:06:42,160 --> 00:06:44,269 case, it may be that whenever 165 00:06:44,270 --> 00:06:46,509 a software developer in Menlo Park 166 00:06:46,510 --> 00:06:48,819 adjusts the perimeter, someone 167 00:06:48,820 --> 00:06:50,949 somewhere wrongly starts to believe 168 00:06:50,950 --> 00:06:52,240 themselves to be unloved. 169 00:06:56,200 --> 00:06:57,789 And there's a lot of research that has 170 00:06:57,790 --> 00:07:00,129 pointed out the political, 171 00:07:00,130 --> 00:07:02,469 social and cultural importance 172 00:07:02,470 --> 00:07:04,269 of machine learning based curation 173 00:07:04,270 --> 00:07:05,270 systems. 174 00:07:05,980 --> 00:07:08,319 Zeynep Tufekci wrote 175 00:07:08,320 --> 00:07:10,449 in The New York Times that YouTube may 176 00:07:10,450 --> 00:07:13,239 be one of the most powerful radicalizing 177 00:07:13,240 --> 00:07:15,790 instruments of the 21st century. 178 00:07:17,320 --> 00:07:19,839 And challenges like fake news, 179 00:07:19,840 --> 00:07:22,689 bias predictions and filter bubbles 180 00:07:22,690 --> 00:07:24,549 make an understanding of email based 181 00:07:24,550 --> 00:07:26,619 curation systems an important 182 00:07:26,620 --> 00:07:28,629 and timely concern. 183 00:07:28,630 --> 00:07:31,059 Journalists and researchers have accused 184 00:07:31,060 --> 00:07:33,159 M.O. Based curation systems of 185 00:07:33,160 --> 00:07:35,679 enabling the spread of fake news 186 00:07:35,680 --> 00:07:38,319 or conspiracy theories in general, 187 00:07:38,320 --> 00:07:40,479 and these accusations make sense because 188 00:07:40,480 --> 00:07:42,609 such systems can shape, uses media 189 00:07:42,610 --> 00:07:45,519 consumption and influence their actions. 190 00:07:45,520 --> 00:07:47,229 I would talk a lot about bias in this 191 00:07:47,230 --> 00:07:49,569 talk, so I want to operationalize 192 00:07:49,570 --> 00:07:51,669 what I understand as bias 193 00:07:51,670 --> 00:07:53,859 in the context of my thesis and also 194 00:07:53,860 --> 00:07:54,759 this talk. 195 00:07:54,760 --> 00:07:57,849 I operationalized bias as an inclination, 196 00:07:57,850 --> 00:08:00,129 prejudice or overrepresentation 197 00:08:00,130 --> 00:08:02,469 for or against one person 198 00:08:02,470 --> 00:08:05,199 group. Topic idea or content, 199 00:08:05,200 --> 00:08:06,969 especially in a way considered to be 200 00:08:06,970 --> 00:08:07,970 unfair. 201 00:08:08,630 --> 00:08:11,119 And they are famous examples for pious 202 00:08:11,120 --> 00:08:12,259 predictions. 203 00:08:12,260 --> 00:08:14,149 Epstein and Robertson, for instance, 204 00:08:14,150 --> 00:08:16,249 found that biased search engine 205 00:08:16,250 --> 00:08:18,589 results can shift the voting preference 206 00:08:18,590 --> 00:08:20,959 of undecided voters by 20 percent 207 00:08:20,960 --> 00:08:22,009 or more. 208 00:08:22,010 --> 00:08:24,379 So considering this prior work, 209 00:08:24,380 --> 00:08:26,869 I started to believe that it is important 210 00:08:26,870 --> 00:08:29,029 to understand whether users are aware of 211 00:08:29,030 --> 00:08:30,409 the email based systems they're 212 00:08:30,410 --> 00:08:32,389 interacting with and whether users 213 00:08:32,390 --> 00:08:34,939 understand how such systems work. 214 00:08:34,940 --> 00:08:37,009 Otherwise, users might believe 215 00:08:37,010 --> 00:08:39,048 that they're presented with an objective 216 00:08:39,049 --> 00:08:41,089 reality, even though the news they're 217 00:08:41,090 --> 00:08:43,308 seeing is the result of a co-production 218 00:08:43,309 --> 00:08:45,859 between their actions as a user and 219 00:08:45,860 --> 00:08:47,899 a machine learning system's ability to 220 00:08:47,900 --> 00:08:49,039 infer their interest. 221 00:08:50,920 --> 00:08:53,049 Now considered this example, if we 222 00:08:53,050 --> 00:08:55,179 have a recommender system, it can easily 223 00:08:55,180 --> 00:08:57,399 lead to a virtuous circle, so 224 00:08:57,400 --> 00:08:59,379 you end up watching a video related to 225 00:08:59,380 --> 00:09:00,579 human rights. 226 00:09:00,580 --> 00:09:02,259 And then you learn about the treatment of 227 00:09:02,260 --> 00:09:05,019 asylum seekers at the borders, 228 00:09:05,020 --> 00:09:07,269 and that leads you to develop an interest 229 00:09:07,270 --> 00:09:09,519 in the decriminalization of Civil 230 00:09:09,520 --> 00:09:10,520 Sea Rescue. 231 00:09:11,660 --> 00:09:13,909 However, it can also lead to a vicious 232 00:09:13,910 --> 00:09:16,069 circle where you watch a video about 233 00:09:16,070 --> 00:09:18,409 a crime committed by a foreigner 234 00:09:18,410 --> 00:09:20,659 and then you see many videos about crimes 235 00:09:20,660 --> 00:09:22,519 committed by foreigners because the 236 00:09:22,520 --> 00:09:24,589 system just infers 237 00:09:24,590 --> 00:09:26,299 that's what he's interested in. 238 00:09:26,300 --> 00:09:27,469 That's what he likes. 239 00:09:27,470 --> 00:09:30,139 So it's just giving you what you like, 240 00:09:30,140 --> 00:09:32,299 and you may end up with a distorted 241 00:09:32,300 --> 00:09:34,789 view of reality, changed political 242 00:09:34,790 --> 00:09:37,849 views and even xenophobia. 243 00:09:37,850 --> 00:09:40,009 This poses the question How does 244 00:09:40,010 --> 00:09:42,319 machine learning influence people? 245 00:09:42,320 --> 00:09:43,429 And you might have heard about the 246 00:09:43,430 --> 00:09:45,619 potential dangers of so-called online 247 00:09:45,620 --> 00:09:48,139 radicalization and algorithmic 248 00:09:48,140 --> 00:09:50,239 rabbit holes where people end 249 00:09:50,240 --> 00:09:52,459 up in this loop that I just described. 250 00:09:52,460 --> 00:09:54,619 But they have one topic and then 251 00:09:54,620 --> 00:09:56,509 see more and more related to that 252 00:09:56,510 --> 00:09:57,799 particular topic. 253 00:09:57,800 --> 00:09:59,839 And there was one incident that really 254 00:09:59,840 --> 00:10:02,089 motivated me to understand what's 255 00:10:02,090 --> 00:10:04,489 going on on YouTube and what kind of 256 00:10:04,490 --> 00:10:06,829 recommendation the system is provide. 257 00:10:06,830 --> 00:10:08,599 So most of you probably would remember 258 00:10:08,600 --> 00:10:11,209 that in Canada, the stabbing of a citizen 259 00:10:11,210 --> 00:10:13,129 spawned street demonstrations and 260 00:10:13,130 --> 00:10:14,209 rioting. 261 00:10:14,210 --> 00:10:15,799 And the New York Times wrote an article 262 00:10:15,800 --> 00:10:18,379 called As German Sikh News 263 00:10:18,380 --> 00:10:20,480 YouTube Delivers Far Right to Rights. 264 00:10:21,650 --> 00:10:23,719 So, according to the Times, people 265 00:10:23,720 --> 00:10:26,119 who try to inform themselves on YouTube 266 00:10:26,120 --> 00:10:28,339 were shown increasingly radical far 267 00:10:28,340 --> 00:10:30,949 right videos about the incident, which 268 00:10:30,950 --> 00:10:33,169 allegedly radicalized them 269 00:10:33,170 --> 00:10:35,539 and which fueled the protests. 270 00:10:35,540 --> 00:10:37,969 And this motivated us to perform 271 00:10:37,970 --> 00:10:40,039 audits to see whether YouTube 272 00:10:40,040 --> 00:10:42,649 is actually systematically 273 00:10:42,650 --> 00:10:44,719 recommending more and more 274 00:10:44,720 --> 00:10:45,720 radical content. 275 00:10:48,290 --> 00:10:49,849 Why is this important? 276 00:10:49,850 --> 00:10:52,279 Well, video recommendations of political 277 00:10:52,280 --> 00:10:54,409 topics and news have special 278 00:10:54,410 --> 00:10:56,899 requirements, especially in Germany. 279 00:10:56,900 --> 00:10:59,119 We have laws that force 280 00:10:59,120 --> 00:11:01,189 broadcasters to 281 00:11:01,190 --> 00:11:03,739 provide fair and balanced reporting. 282 00:11:03,740 --> 00:11:05,899 We have also laws that make sure 283 00:11:05,900 --> 00:11:08,179 that minorities are protected. 284 00:11:08,180 --> 00:11:09,799 And Germany, one of these laws is the 285 00:11:09,800 --> 00:11:12,379 so-called one thing starts photog 286 00:11:12,380 --> 00:11:14,689 the interstate broadcasting equipment, 287 00:11:14,690 --> 00:11:16,369 and it's a law that enforces that 288 00:11:16,370 --> 00:11:18,499 broadcasting services report in 289 00:11:18,500 --> 00:11:20,569 a fair and balanced manner that takes 290 00:11:20,570 --> 00:11:22,220 minority views into account. 291 00:11:23,260 --> 00:11:25,329 So motivated by this, we 292 00:11:25,330 --> 00:11:27,879 performed the 2019 YouTube Cannon's 293 00:11:27,880 --> 00:11:29,979 audit, and we 294 00:11:29,980 --> 00:11:32,439 found that YouTube is not pushing 295 00:11:32,440 --> 00:11:34,839 users towards politically extreme 296 00:11:34,840 --> 00:11:37,749 content by consistently suggesting 297 00:11:37,750 --> 00:11:39,579 more extreme videos. 298 00:11:39,580 --> 00:11:41,679 YouTube is also not letting users down 299 00:11:41,680 --> 00:11:43,839 rabbit hole by zooming in 300 00:11:43,840 --> 00:11:46,449 on specific political topics. 301 00:11:46,450 --> 00:11:48,579 What we found is that YouTube is 302 00:11:48,580 --> 00:11:51,339 pushing increasingly more popular content 303 00:11:51,340 --> 00:11:53,709 as measured by the views and likes. 304 00:11:53,710 --> 00:11:55,659 The sentence evoked by the videos 305 00:11:55,660 --> 00:11:57,849 decreased while the happiness 306 00:11:57,850 --> 00:11:58,850 increased. 307 00:12:00,070 --> 00:12:02,439 Now, let's take one step back, 308 00:12:02,440 --> 00:12:04,599 because this is only part of a 309 00:12:04,600 --> 00:12:07,149 much larger puzzle to thoroughly 310 00:12:07,150 --> 00:12:09,369 understand radicalization on YouTube 311 00:12:09,370 --> 00:12:11,829 and how YouTube influences the behavior 312 00:12:11,830 --> 00:12:13,989 of users, research would have to 313 00:12:13,990 --> 00:12:16,209 show that YouTube 314 00:12:16,210 --> 00:12:18,189 is presenting users with increasingly 315 00:12:18,190 --> 00:12:20,499 extreme content that this extreme 316 00:12:20,500 --> 00:12:22,569 content negatively affects 317 00:12:22,570 --> 00:12:24,879 users attitudes, that this 318 00:12:24,880 --> 00:12:27,009 affects their intentions, and 319 00:12:27,010 --> 00:12:29,469 that this changes their behavior. 320 00:12:29,470 --> 00:12:31,719 With this talk, I really only can 321 00:12:31,720 --> 00:12:34,059 talk about the first point that is 322 00:12:34,060 --> 00:12:36,159 whether YouTube is presenting users 323 00:12:36,160 --> 00:12:39,199 with increasingly extreme content. 324 00:12:39,200 --> 00:12:41,689 But I strongly invite other researchers 325 00:12:41,690 --> 00:12:44,749 to look at all these different aspects. 326 00:12:44,750 --> 00:12:47,089 And these audits can really be a voice 327 00:12:47,090 --> 00:12:48,229 of the voiceless. 328 00:12:48,230 --> 00:12:50,299 The dictionary defines the word 329 00:12:50,300 --> 00:12:52,399 ordered as a systematic 330 00:12:52,400 --> 00:12:55,189 review or assessment of something. 331 00:12:55,190 --> 00:12:57,289 And in this talk, I will show you 332 00:12:57,290 --> 00:13:00,049 that audits can enable researchers 333 00:13:00,050 --> 00:13:02,119 and civic hackers to uncover 334 00:13:02,120 --> 00:13:04,519 the potential hidden agendas of social 335 00:13:04,520 --> 00:13:05,899 networking sites. 336 00:13:05,900 --> 00:13:08,149 Or it's especially interesting 337 00:13:08,150 --> 00:13:09,979 because they're immediately meaningful to 338 00:13:09,980 --> 00:13:12,139 users, as newspaper reports 339 00:13:12,140 --> 00:13:13,729 by Smith at all suggests. 340 00:13:13,730 --> 00:13:16,579 So unlike explainable AI techniques, 341 00:13:16,580 --> 00:13:18,049 which might require a deeper 342 00:13:18,050 --> 00:13:20,149 understanding of statistics, these orders 343 00:13:20,150 --> 00:13:23,059 can be interpreted by anybody. 344 00:13:23,060 --> 00:13:25,249 So why perform audits? 345 00:13:25,250 --> 00:13:27,469 Because they enable individuals and 346 00:13:27,470 --> 00:13:29,719 society at large to monitor 347 00:13:29,720 --> 00:13:31,789 and control the recommendations 348 00:13:31,790 --> 00:13:33,139 of machine learning systems? 349 00:13:34,740 --> 00:13:36,869 And I found that orders 350 00:13:36,870 --> 00:13:39,929 are a very useful way to identify 351 00:13:39,930 --> 00:13:42,089 potential biases enacted by 352 00:13:42,090 --> 00:13:43,589 these systems. 353 00:13:43,590 --> 00:13:45,539 Some think it all distinguished five 354 00:13:45,540 --> 00:13:47,309 different kinds of algorithmic ordered 355 00:13:47,310 --> 00:13:48,209 studies. 356 00:13:48,210 --> 00:13:50,399 Court ordered noninvasive user 357 00:13:50,400 --> 00:13:52,769 audits, scraping audits, sock 358 00:13:52,770 --> 00:13:54,989 puppet audits and crowdsourced 359 00:13:54,990 --> 00:13:57,749 audits. And I'm going to explain 360 00:13:57,750 --> 00:13:59,909 each one of them step by step 361 00:13:59,910 --> 00:14:01,469 in the following. 362 00:14:01,470 --> 00:14:03,809 So if a code audit, you 363 00:14:03,810 --> 00:14:06,419 obtain a copy of a relevant algorithm 364 00:14:06,420 --> 00:14:08,519 and then you study the instructions in a 365 00:14:08,520 --> 00:14:10,379 programing language. 366 00:14:10,380 --> 00:14:12,839 And this is challenging since the code 367 00:14:12,840 --> 00:14:14,969 is considered valuable intellectual 368 00:14:14,970 --> 00:14:17,249 property and the code 369 00:14:17,250 --> 00:14:19,469 is commonly conceived using trade secret 370 00:14:19,470 --> 00:14:21,659 protection understanding systems 371 00:14:21,660 --> 00:14:22,949 through code audits. 372 00:14:22,950 --> 00:14:25,019 It's also challenging because algorithms 373 00:14:25,020 --> 00:14:26,729 depend on personal data. 374 00:14:26,730 --> 00:14:28,649 That is, they need to be audited with 375 00:14:28,650 --> 00:14:30,209 real data to be understood. 376 00:14:32,140 --> 00:14:33,909 Machine learning algorithms are also 377 00:14:33,910 --> 00:14:36,219 quite trivial, so the data 378 00:14:36,220 --> 00:14:38,169 is the most important thing. 379 00:14:38,170 --> 00:14:40,299 And to illustrate this, I have the quote 380 00:14:40,300 --> 00:14:41,859 example here on the right. 381 00:14:41,860 --> 00:14:43,749 And that's actually a fully functioning 382 00:14:43,750 --> 00:14:46,059 machine learning system that can detect 383 00:14:46,060 --> 00:14:48,129 spam. It was called it using the 384 00:14:48,130 --> 00:14:50,259 Python Library Typekit Learn, 385 00:14:50,260 --> 00:14:52,119 which makes it quite easy to train 386 00:14:52,120 --> 00:14:53,829 machine learning systems, hiding a lot of 387 00:14:53,830 --> 00:14:55,359 the complexity. 388 00:14:55,360 --> 00:14:57,429 And what you can see here is 389 00:14:57,430 --> 00:15:00,099 that the things that are specific 390 00:15:00,100 --> 00:15:02,229 to the spam filtering use case are 391 00:15:02,230 --> 00:15:04,239 just the two things that are highlighted. 392 00:15:04,240 --> 00:15:06,399 It's the file with the data, 393 00:15:06,400 --> 00:15:08,559 it's the emails that see us fee, 394 00:15:08,560 --> 00:15:10,479 as well as the dimensionality of the 395 00:15:10,480 --> 00:15:12,789 data. Now, if we 396 00:15:12,790 --> 00:15:14,889 would change the data from 397 00:15:14,890 --> 00:15:16,959 images that CSP to constant 398 00:15:16,960 --> 00:15:19,089 CSP, we could easily turn the system 399 00:15:19,090 --> 00:15:21,159 into a car recommendations. 400 00:15:21,160 --> 00:15:23,379 We could also swap the file image 401 00:15:23,380 --> 00:15:25,509 sources for a file called cancer 402 00:15:25,510 --> 00:15:27,639 that sees fit and turn this into 403 00:15:27,640 --> 00:15:29,289 a breast cancer detection system. 404 00:15:30,390 --> 00:15:32,609 It all goes to show that the 405 00:15:32,610 --> 00:15:34,679 algorithm and studying the algorithm 406 00:15:34,680 --> 00:15:36,989 is not sufficient and not really 407 00:15:36,990 --> 00:15:38,249 helpful for our use case. 408 00:15:38,250 --> 00:15:41,089 So we really have to look at the output. 409 00:15:41,090 --> 00:15:43,309 The second type of order that some 410 00:15:43,310 --> 00:15:45,709 forget or recognize the so-called 411 00:15:45,710 --> 00:15:48,119 noninvasive user audits. 412 00:15:48,120 --> 00:15:50,659 There you asked us questions 413 00:15:50,660 --> 00:15:52,219 using a survey format. 414 00:15:52,220 --> 00:15:54,469 However, this comes with serious sampling 415 00:15:54,470 --> 00:15:56,839 problems because how do you actually 416 00:15:56,840 --> 00:15:58,729 reach the users that you want to reach? 417 00:15:58,730 --> 00:16:01,009 This also comes with important validity 418 00:16:01,010 --> 00:16:03,259 problems, for instance, due to cognitive 419 00:16:03,260 --> 00:16:05,359 biases, because people might just 420 00:16:05,360 --> 00:16:07,549 remember things wrongly and 421 00:16:07,550 --> 00:16:09,709 might not be good about explaining 422 00:16:09,710 --> 00:16:11,749 why they did certain things. 423 00:16:11,750 --> 00:16:13,789 The third kind of audits are the 424 00:16:13,790 --> 00:16:16,009 so-called scraping audits, and 425 00:16:16,010 --> 00:16:17,959 there you have a script that interacts 426 00:16:17,960 --> 00:16:19,729 with a platform, for instance, by 427 00:16:19,730 --> 00:16:22,309 querying a particular URL. 428 00:16:22,310 --> 00:16:24,679 And this allows researchers to obtain 429 00:16:24,680 --> 00:16:26,869 a large number of relevant data 430 00:16:26,870 --> 00:16:29,419 points, a more sophisticated version 431 00:16:29,420 --> 00:16:31,729 of these audits of the so-called sock 432 00:16:31,730 --> 00:16:32,959 puppet or. 433 00:16:32,960 --> 00:16:35,239 He is really impersonating 434 00:16:35,240 --> 00:16:37,609 a user and creating 435 00:16:37,610 --> 00:16:40,189 programmatically constructed traffic. 436 00:16:40,190 --> 00:16:42,289 And this is what I will be focusing on, 437 00:16:42,290 --> 00:16:43,879 and I'm going to explain it in more 438 00:16:43,880 --> 00:16:45,649 detail and the next step. 439 00:16:45,650 --> 00:16:47,839 So the other potential way of 440 00:16:47,840 --> 00:16:49,969 performing an audit are the so-called 441 00:16:49,970 --> 00:16:51,409 crowdsourced audit. 442 00:16:51,410 --> 00:16:53,509 And then you recruit a large number of 443 00:16:53,510 --> 00:16:55,849 users to use 444 00:16:55,850 --> 00:16:57,139 a particular platform. 445 00:16:57,140 --> 00:16:58,939 So it's quite similar to the sock puppet 446 00:16:58,940 --> 00:17:00,859 audit, but it's doing it with real 447 00:17:00,860 --> 00:17:01,879 people. 448 00:17:01,880 --> 00:17:04,009 However, this is challenging because 449 00:17:04,010 --> 00:17:06,679 you need to find a large number of people 450 00:17:06,680 --> 00:17:08,328 that can either be done through Amazon 451 00:17:08,329 --> 00:17:10,669 Mechanical Turk or through 452 00:17:10,670 --> 00:17:12,739 inviting volunteers, but 453 00:17:12,740 --> 00:17:14,449 that can be quite challenging. 454 00:17:14,450 --> 00:17:16,639 So I performed the so-called sock puppet 455 00:17:16,640 --> 00:17:19,249 audit, where I wrote a script 456 00:17:19,250 --> 00:17:21,529 that is remote, controlling a browser 457 00:17:21,530 --> 00:17:23,719 and impersonating a 458 00:17:23,720 --> 00:17:25,818 real user just to remind us what we 459 00:17:25,819 --> 00:17:26,749 were trying to do. 460 00:17:26,750 --> 00:17:28,609 We were motivated by the candidates 461 00:17:28,610 --> 00:17:30,619 incidents, and we wanted to know whether 462 00:17:30,620 --> 00:17:32,749 YouTube is actually 463 00:17:32,750 --> 00:17:34,879 showing increasingly radical, far 464 00:17:34,880 --> 00:17:37,189 right videos for a variety of political 465 00:17:37,190 --> 00:17:39,199 topics, as the New York Times has 466 00:17:39,200 --> 00:17:40,129 claimed. 467 00:17:40,130 --> 00:17:42,199 So using a Firefox based bot 468 00:17:42,200 --> 00:17:44,209 that I'm going to release with this talk, 469 00:17:44,210 --> 00:17:46,399 we performed 150 470 00:17:46,400 --> 00:17:48,349 random walks that always followed the 471 00:17:48,350 --> 00:17:49,609 same procedure. 472 00:17:49,610 --> 00:17:52,039 We randomly picked one of nine political 473 00:17:52,040 --> 00:17:53,839 topics from Germany. 474 00:17:53,840 --> 00:17:56,089 Then we entered the topics in German 475 00:17:56,090 --> 00:17:58,429 into the YouTube search bar. 476 00:17:58,430 --> 00:18:00,769 Then we randomly picked one of the top 477 00:18:00,770 --> 00:18:02,029 10 search results. 478 00:18:03,400 --> 00:18:05,799 And then we save 479 00:18:05,800 --> 00:18:07,929 the video page and watched it for random 480 00:18:07,930 --> 00:18:09,519 number of seconds. 481 00:18:09,520 --> 00:18:11,739 Then we randomly chose one of the top 482 00:18:11,740 --> 00:18:14,139 10 video recommendations displayed 483 00:18:14,140 --> 00:18:16,929 in the right sidebar next to the video. 484 00:18:16,930 --> 00:18:19,689 And then we repeated this 10 times. 485 00:18:19,690 --> 00:18:22,209 And we looked at both quantitative 486 00:18:22,210 --> 00:18:24,009 metrics like the number of likes and 487 00:18:24,010 --> 00:18:26,739 views, as well as qualitative 488 00:18:26,740 --> 00:18:27,789 metrics. 489 00:18:27,790 --> 00:18:30,459 And for this qualitative investigation, 490 00:18:30,460 --> 00:18:33,099 we performed an in-depth analysis 491 00:18:33,100 --> 00:18:35,289 for which we randomly select three videos 492 00:18:35,290 --> 00:18:37,419 per topic and coded three videos 493 00:18:37,420 --> 00:18:38,829 per random walk. 494 00:18:38,830 --> 00:18:41,079 We called it the initial video, 495 00:18:41,080 --> 00:18:44,079 the fifth video and the tenth video, 496 00:18:44,080 --> 00:18:45,729 and this coding was performed by three 497 00:18:45,730 --> 00:18:48,129 independent raters one male to female, 498 00:18:48,130 --> 00:18:50,109 all in their twenties to thirties, who 499 00:18:50,110 --> 00:18:52,779 did not know about the research question. 500 00:18:52,780 --> 00:18:55,029 And they really watched the videos 501 00:18:55,030 --> 00:18:56,859 for five minutes or more. 502 00:18:56,860 --> 00:18:58,599 And then they assessed how closely 503 00:18:58,600 --> 00:19:00,969 related the videos are to political 504 00:19:00,970 --> 00:19:03,129 topics. They also rated whether 505 00:19:03,130 --> 00:19:05,379 the videos evoked sadness or happiness 506 00:19:05,380 --> 00:19:07,749 on an 11 point scale from these 507 00:19:07,750 --> 00:19:10,419 zero to most 10. 508 00:19:10,420 --> 00:19:12,969 So we simulated a regular 509 00:19:12,970 --> 00:19:16,119 web browser by Remote Control, a browser, 510 00:19:16,120 --> 00:19:18,549 and we collected between 12 and 25 511 00:19:18,550 --> 00:19:20,559 random walks per topic. 512 00:19:20,560 --> 00:19:22,959 So for each random walk and each topic, 513 00:19:22,960 --> 00:19:25,299 we started a new browser instance 514 00:19:25,300 --> 00:19:27,189 and cleared all cookies. 515 00:19:27,190 --> 00:19:29,109 All random walks were collected in May 516 00:19:29,110 --> 00:19:31,269 2019 with the same laptop 517 00:19:31,270 --> 00:19:32,529 on the same network. 518 00:19:32,530 --> 00:19:34,689 So the decision to select the fifth 519 00:19:34,690 --> 00:19:36,789 and the 10th recommendation for 520 00:19:36,790 --> 00:19:38,769 the in-depth analysis was made at the 521 00:19:38,770 --> 00:19:40,059 beginning of the study. 522 00:19:40,060 --> 00:19:42,099 That is, before reviewing any of the 523 00:19:42,100 --> 00:19:44,319 material and before we performed any 524 00:19:44,320 --> 00:19:46,719 kind of analysis, the raters 525 00:19:46,720 --> 00:19:48,279 reviewed all videos in the same 526 00:19:48,280 --> 00:19:49,659 randomized order. 527 00:19:49,660 --> 00:19:52,479 We computed hit dogs over 528 00:19:52,480 --> 00:19:54,729 to understand how strong our 529 00:19:54,730 --> 00:19:56,979 integrator agreement is, 530 00:19:56,980 --> 00:19:58,749 and we found substantial agreement 531 00:19:58,750 --> 00:20:01,029 regarding how similar the videos were 532 00:20:01,030 --> 00:20:03,189 to the topics in our investigation. 533 00:20:03,190 --> 00:20:06,189 That said, point seven six five 534 00:20:06,190 --> 00:20:08,199 and the sentence evoked by the videos, 535 00:20:08,200 --> 00:20:10,509 which is set point six one three. 536 00:20:10,510 --> 00:20:12,309 We also have moderate agreement for the 537 00:20:12,310 --> 00:20:14,529 happiness, which is that point four 538 00:20:14,530 --> 00:20:15,530 for one. 539 00:20:18,140 --> 00:20:20,419 So you might wonder what are the topics 540 00:20:20,420 --> 00:20:21,529 that we chose? 541 00:20:21,530 --> 00:20:23,719 So we took nine political topics 542 00:20:23,720 --> 00:20:26,179 from a representative telephone poll 543 00:20:26,180 --> 00:20:28,399 conducted on behalf of the very best 544 00:20:28,400 --> 00:20:29,629 Deutschlandfunk. 545 00:20:29,630 --> 00:20:31,309 You find two topics here on the slide. 546 00:20:31,310 --> 00:20:33,649 I won't read them out, but they were what 547 00:20:33,650 --> 00:20:35,869 people at the time thought were the most 548 00:20:35,870 --> 00:20:37,009 pressing issues. 549 00:20:37,010 --> 00:20:38,689 And we use the keyboard just like they 550 00:20:38,690 --> 00:20:40,879 were in the telephone pole, and 551 00:20:40,880 --> 00:20:42,619 our audit revealed that recommendations 552 00:20:42,620 --> 00:20:45,019 become significantly more popular, 553 00:20:45,020 --> 00:20:46,969 measured by views and likes. 554 00:20:46,970 --> 00:20:49,009 You can see a steep increase from the 555 00:20:49,010 --> 00:20:51,349 initial videos to the recommendations. 556 00:20:52,520 --> 00:20:55,069 Note that we operationalize popularity 557 00:20:55,070 --> 00:20:57,349 as the number of views and likes 558 00:20:57,350 --> 00:20:59,509 we included, both because views are an 559 00:20:59,510 --> 00:21:01,789 implicit measure of popularity, while 560 00:21:01,790 --> 00:21:04,729 likes on explicit measure of popularity 561 00:21:04,730 --> 00:21:05,689 regarding views. 562 00:21:05,690 --> 00:21:07,939 It also remains unclear how many 563 00:21:07,940 --> 00:21:09,919 seconds of video must be watched before 564 00:21:09,920 --> 00:21:12,079 it's counted. We have a table here which 565 00:21:12,080 --> 00:21:14,239 provides the median and mean numbers 566 00:21:14,240 --> 00:21:16,579 of views and likes, comparing 567 00:21:16,580 --> 00:21:18,199 the initial videos and the fifth 568 00:21:18,200 --> 00:21:19,459 recommendations. 569 00:21:19,460 --> 00:21:21,559 A substantial increase in views and 570 00:21:21,560 --> 00:21:23,809 likes can be observed, especially 571 00:21:23,810 --> 00:21:25,789 between the initial videos and the fifth 572 00:21:25,790 --> 00:21:27,259 recommendation. 573 00:21:27,260 --> 00:21:29,359 While the initial videos have a 574 00:21:29,360 --> 00:21:31,339 median of nine thousand five hundred 575 00:21:31,340 --> 00:21:33,439 views, the first recommendations 576 00:21:33,440 --> 00:21:35,539 have a median of around 200000 577 00:21:35,540 --> 00:21:36,769 views. 578 00:21:36,770 --> 00:21:38,269 After following a chain of 10 579 00:21:38,270 --> 00:21:41,029 recommendations, the views have a median 580 00:21:41,030 --> 00:21:43,879 of almost 300000 views. 581 00:21:43,880 --> 00:21:45,349 The number of likes increases 582 00:21:45,350 --> 00:21:47,689 significantly to the initial 583 00:21:47,690 --> 00:21:49,609 videos have a median of one hundred and 584 00:21:49,610 --> 00:21:51,559 seventy likes, while the fifth 585 00:21:51,560 --> 00:21:53,809 recommendations have a median of one 586 00:21:53,810 --> 00:21:55,520 thousand four hundred four likes. 587 00:21:56,880 --> 00:21:58,949 We performed two men, Whitney 588 00:21:58,950 --> 00:22:01,559 Youth Test, which support the finding 589 00:22:01,560 --> 00:22:03,449 that the number of views and the like 590 00:22:03,450 --> 00:22:05,489 change between the initial videos and the 591 00:22:05,490 --> 00:22:06,869 recommendation. 592 00:22:06,870 --> 00:22:08,579 The audience also revealed that 593 00:22:08,580 --> 00:22:11,069 recommendations become significantly 594 00:22:11,070 --> 00:22:13,949 less related to political topics. 595 00:22:13,950 --> 00:22:16,319 The median topic similarity 596 00:22:16,320 --> 00:22:18,449 rating of the initial videos was 597 00:22:18,450 --> 00:22:19,439 eight. 598 00:22:19,440 --> 00:22:21,689 This decreased dramatically to zero 599 00:22:21,690 --> 00:22:24,089 point eight three after following only 600 00:22:24,090 --> 00:22:25,919 five recommendations. 601 00:22:25,920 --> 00:22:28,169 The similarity remains very 602 00:22:28,170 --> 00:22:30,029 low for the 10th recommendations, with a 603 00:22:30,030 --> 00:22:32,159 median of one to 604 00:22:32,160 --> 00:22:34,259 two. New tests indicate that 605 00:22:34,260 --> 00:22:36,419 the topics in the videos changed between 606 00:22:36,420 --> 00:22:37,829 the initial videos and the fifth 607 00:22:37,830 --> 00:22:39,989 recommendations and between the initial 608 00:22:39,990 --> 00:22:42,089 videos and the tenth recommendation. 609 00:22:42,090 --> 00:22:44,429 So all these results indicate a strong 610 00:22:44,430 --> 00:22:45,430 topic drift. 611 00:22:47,420 --> 00:22:49,189 We also found that the happiness in the 612 00:22:49,190 --> 00:22:51,409 video increased while the sadness 613 00:22:51,410 --> 00:22:53,809 decreased, so the happiness 614 00:22:53,810 --> 00:22:55,999 changes from a median of zero 615 00:22:56,000 --> 00:22:58,099 for the initial videos to a median of 616 00:22:58,100 --> 00:22:59,569 two for the fifth and tenth 617 00:22:59,570 --> 00:23:01,039 recommendation. 618 00:23:01,040 --> 00:23:03,109 So while seventy five percent of 619 00:23:03,110 --> 00:23:04,759 the initial videos have a happiness 620 00:23:04,760 --> 00:23:07,339 rating between zero and two, 621 00:23:07,340 --> 00:23:09,379 more than half of the fifth and tenth 622 00:23:09,380 --> 00:23:11,179 recommendations have a happiness rating 623 00:23:11,180 --> 00:23:12,859 higher than two. 624 00:23:12,860 --> 00:23:14,779 Regarding the sadness evoked by the 625 00:23:14,780 --> 00:23:16,939 videos, the trend is opposite 626 00:23:16,940 --> 00:23:19,009 the median ratings in the box plot in 627 00:23:19,010 --> 00:23:21,379 the figure move from one point sixty 628 00:23:21,380 --> 00:23:23,479 seven for the initial videos, down to 629 00:23:23,480 --> 00:23:26,059 zero point zero for the fifth and 630 00:23:26,060 --> 00:23:28,099 zero point three three for the tenth 631 00:23:28,100 --> 00:23:29,659 recommendation. 632 00:23:29,660 --> 00:23:31,819 So while more than half of the initial 633 00:23:31,820 --> 00:23:33,589 videos have a centeredness rating, higher 634 00:23:33,590 --> 00:23:35,809 than one point six seven seventy five 635 00:23:35,810 --> 00:23:38,029 percent of the tenth recommendations 636 00:23:38,030 --> 00:23:40,249 have a rating smaller than one 637 00:23:40,250 --> 00:23:42,409 overall, in contrast to what the New York 638 00:23:42,410 --> 00:23:43,639 Times reported. 639 00:23:43,640 --> 00:23:46,099 Our findings suggest that the dangers 640 00:23:46,100 --> 00:23:48,199 of online radicalization may 641 00:23:48,200 --> 00:23:49,969 be exaggerated. 642 00:23:49,970 --> 00:23:52,009 Now, taking a step back and taking the 643 00:23:52,010 --> 00:23:54,199 power back, I want you to understand 644 00:23:54,200 --> 00:23:56,299 that scraping audits and sock puppet 645 00:23:56,300 --> 00:23:58,399 audits on my opinion the most 646 00:23:58,400 --> 00:24:00,739 promising method to investigate complex 647 00:24:00,740 --> 00:24:02,299 machine learning systems because these 648 00:24:02,300 --> 00:24:04,759 audits can be used to identify 649 00:24:04,760 --> 00:24:06,769 popularity biases like the one that I 650 00:24:06,770 --> 00:24:09,439 showed you. But it can also be used to 651 00:24:09,440 --> 00:24:11,509 see whether a system is enacting a 652 00:24:11,510 --> 00:24:13,579 gender bias, or if a system has 653 00:24:13,580 --> 00:24:15,769 a tendency to discriminate against 654 00:24:15,770 --> 00:24:18,289 or towards a particular ethnic group. 655 00:24:18,290 --> 00:24:20,419 So from your experience reading 656 00:24:20,420 --> 00:24:22,189 the news, you know that controversial 657 00:24:22,190 --> 00:24:24,709 political topics require a balanced 658 00:24:24,710 --> 00:24:26,959 presentation of all arguments in 659 00:24:26,960 --> 00:24:29,899 a way that weighs the pros and cons. 660 00:24:29,900 --> 00:24:32,359 However, the audit suggests 661 00:24:32,360 --> 00:24:34,969 that YouTube's recommendation system 662 00:24:34,970 --> 00:24:36,889 is not suited to help users inform 663 00:24:36,890 --> 00:24:39,049 themself about complex political 664 00:24:39,050 --> 00:24:41,209 issues. Popularity is measured 665 00:24:41,210 --> 00:24:43,879 by likes and views was the defining 666 00:24:43,880 --> 00:24:45,799 factor for selecting recommendation. 667 00:24:46,960 --> 00:24:49,509 And if this is the case, then minority 668 00:24:49,510 --> 00:24:51,789 views are not adequately taken into 669 00:24:51,790 --> 00:24:53,259 account by the system. 670 00:24:53,260 --> 00:24:55,389 So the popular recommendation that you 671 00:24:55,390 --> 00:24:57,489 see here are of course, attractive 672 00:24:57,490 --> 00:24:59,559 for the majority, and this could 673 00:24:59,560 --> 00:25:01,749 be motivated by financial incentives 674 00:25:01,750 --> 00:25:03,879 that try to optimize the watch time 675 00:25:03,880 --> 00:25:06,039 for broad majority. 676 00:25:06,040 --> 00:25:08,469 Our audit corroborates Smith 677 00:25:08,470 --> 00:25:10,539 at all, who also performed random 678 00:25:10,540 --> 00:25:12,849 books. Smith, at old random walks 679 00:25:12,850 --> 00:25:15,159 were criticized as artificial because 680 00:25:15,160 --> 00:25:17,349 they relied on YouTube's API. 681 00:25:17,350 --> 00:25:19,899 In contrast to that, we remote control 682 00:25:19,900 --> 00:25:21,909 Firefox browser from a university 683 00:25:21,910 --> 00:25:22,989 network. 684 00:25:22,990 --> 00:25:25,179 So in a way, the audit is a prime 685 00:25:25,180 --> 00:25:27,519 example for the recentering 686 00:25:27,520 --> 00:25:29,679 of public engagement around the 687 00:25:29,680 --> 00:25:32,889 complementary interest of the majority 688 00:25:32,890 --> 00:25:34,509 and profitability. 689 00:25:34,510 --> 00:25:36,099 And this connects to Harper's 690 00:25:36,100 --> 00:25:38,289 investigation of the so-called big 691 00:25:38,290 --> 00:25:40,809 data public and its problem 692 00:25:40,810 --> 00:25:42,879 because, as I said, from a platform 693 00:25:42,880 --> 00:25:45,309 perspective, we're longer watch times was 694 00:25:45,310 --> 00:25:47,409 more shown ETS, which leads to 695 00:25:47,410 --> 00:25:48,339 more money. 696 00:25:48,340 --> 00:25:50,139 It makes a lot of sense to target the 697 00:25:50,140 --> 00:25:51,140 majority. 698 00:25:53,010 --> 00:25:55,199 One important limitation of my approach 699 00:25:55,200 --> 00:25:57,299 is that we cannot rule out that a rabbit 700 00:25:57,300 --> 00:25:59,159 hole effect exists for a particular 701 00:25:59,160 --> 00:26:01,799 subset of users and topics, especially 702 00:26:01,800 --> 00:26:04,109 for those where users actively looking 703 00:26:04,110 --> 00:26:05,939 for fringe content. 704 00:26:05,940 --> 00:26:07,679 And that's why it's important to 705 00:26:07,680 --> 00:26:09,299 understand the users and their 706 00:26:09,300 --> 00:26:11,549 understanding of the system, because 707 00:26:11,550 --> 00:26:13,859 YouTube is a complex socio technical 708 00:26:13,860 --> 00:26:16,439 system with human and non-human 709 00:26:16,440 --> 00:26:18,509 actors who all influence how 710 00:26:18,510 --> 00:26:21,149 information is accessed and understood. 711 00:26:21,150 --> 00:26:23,309 And we also wrote a paper 712 00:26:23,310 --> 00:26:25,919 on this in more detail where we examined 713 00:26:25,920 --> 00:26:28,349 how middle age users without a background 714 00:26:28,350 --> 00:26:31,049 in technology think YouTube works. 715 00:26:31,050 --> 00:26:33,209 That is, we asked them why they 716 00:26:33,210 --> 00:26:35,249 think they see the recommendations they 717 00:26:35,250 --> 00:26:36,269 see. 718 00:26:36,270 --> 00:26:39,149 So we found four big user beliefs. 719 00:26:39,150 --> 00:26:41,579 One is related to the current user's 720 00:26:41,580 --> 00:26:44,159 previous actions and how they influenced 721 00:26:44,160 --> 00:26:45,689 recommendations. 722 00:26:45,690 --> 00:26:48,089 The second is related to social 723 00:26:48,090 --> 00:26:50,399 media. That is how other users 724 00:26:50,400 --> 00:26:53,249 actions influence the recommendations. 725 00:26:53,250 --> 00:26:56,189 The third is related to the algorithm 726 00:26:56,190 --> 00:26:58,289 and what the algorithm regards 727 00:26:58,290 --> 00:27:00,369 as similar who is similar, what 728 00:27:00,370 --> 00:27:02,519 the similar and also what kind 729 00:27:02,520 --> 00:27:04,709 of context the algorithm actually 730 00:27:04,710 --> 00:27:06,119 takes into account. 731 00:27:06,120 --> 00:27:08,249 And the fourth user belief relates 732 00:27:08,250 --> 00:27:10,049 to the organization and the company 733 00:27:10,050 --> 00:27:11,069 policy. 734 00:27:11,070 --> 00:27:13,439 And it's interesting because especially 735 00:27:13,440 --> 00:27:15,539 company policy is connected to 736 00:27:15,540 --> 00:27:17,789 a lot of negative beliefs where 737 00:27:17,790 --> 00:27:19,979 people think that YouTube is 738 00:27:19,980 --> 00:27:22,469 actually selling the recommendations 739 00:27:22,470 --> 00:27:24,419 and they think that YouTube has 740 00:27:24,420 --> 00:27:26,639 psychological experts that just try 741 00:27:26,640 --> 00:27:29,369 to keep them watching and watching. 742 00:27:29,370 --> 00:27:32,249 If you want more, read the paper. 743 00:27:32,250 --> 00:27:34,609 It's written by us, llevado 744 00:27:34,610 --> 00:27:36,749 me, one of the best 745 00:27:36,750 --> 00:27:38,939 fighter and katwe that there, 746 00:27:38,940 --> 00:27:41,309 and it was published at the CCW 747 00:27:41,310 --> 00:27:43,049 conference this year. 748 00:27:44,220 --> 00:27:46,409 Now, here's my call to action. 749 00:27:46,410 --> 00:27:48,719 I want you to collect YouTube search 750 00:27:48,720 --> 00:27:50,879 results, video recommendations 751 00:27:50,880 --> 00:27:53,549 and advertisements for different topics. 752 00:27:53,550 --> 00:27:55,589 And I want you to do this without user 753 00:27:55,590 --> 00:27:57,959 accounts and with user accounts. 754 00:27:57,960 --> 00:28:00,059 And my goal is to systematically 755 00:28:00,060 --> 00:28:02,309 analyze the recommendations by 756 00:28:02,310 --> 00:28:03,750 YouTube's machine learning system. 757 00:28:05,060 --> 00:28:07,069 And the next step after that would be to 758 00:28:07,070 --> 00:28:09,469 design and implement and evaluate 759 00:28:09,470 --> 00:28:11,569 algorithmic transparency tools 760 00:28:11,570 --> 00:28:14,179 that help users understand and influence 761 00:28:14,180 --> 00:28:15,739 their recommendation. 762 00:28:15,740 --> 00:28:17,389 And in the following, I will show you the 763 00:28:17,390 --> 00:28:19,609 script that I wrote, and I'm also 764 00:28:19,610 --> 00:28:21,529 going to point out where it can be 765 00:28:21,530 --> 00:28:24,469 adapted to not only study YouTube 766 00:28:24,470 --> 00:28:26,299 and to not only study YouTube across 767 00:28:26,300 --> 00:28:28,549 countries, languages and topics, but also 768 00:28:28,550 --> 00:28:31,249 to study other platforms like Instagram, 769 00:28:31,250 --> 00:28:34,219 like Tik Tok and a variety of more. 770 00:28:34,220 --> 00:28:36,229 The audits, in my opinion, could be a 771 00:28:36,230 --> 00:28:38,689 powerful tool to surveil 772 00:28:38,690 --> 00:28:40,639 surveillance capitalism. 773 00:28:40,640 --> 00:28:42,469 With the orders that I described, it 774 00:28:42,470 --> 00:28:44,629 would be possible to investigate 775 00:28:44,630 --> 00:28:46,759 how the content is targeted to 776 00:28:46,760 --> 00:28:48,919 individual users, so the 777 00:28:48,920 --> 00:28:51,169 audits could be used to explore 778 00:28:51,170 --> 00:28:53,299 how advertisements are targeting 779 00:28:53,300 --> 00:28:55,579 specific users, and this directly 780 00:28:55,580 --> 00:28:57,499 relates to the dangers of the so-called 781 00:28:57,500 --> 00:28:58,790 surveillance capitalism. 782 00:28:59,990 --> 00:29:02,059 Shoshana Zuboff describe surveillance 783 00:29:02,060 --> 00:29:04,819 capitalism as human experience, 784 00:29:04,820 --> 00:29:07,039 which is used as raw material for 785 00:29:07,040 --> 00:29:09,709 translation into behavioral data, 786 00:29:09,710 --> 00:29:11,479 which are declared as a proprietary 787 00:29:11,480 --> 00:29:13,759 behavior. So plus fed into 788 00:29:13,760 --> 00:29:15,949 advanced manufacturing processes 789 00:29:15,950 --> 00:29:18,139 known as so-called machine intelligence 790 00:29:18,140 --> 00:29:20,569 and fabricated into prediction products 791 00:29:20,570 --> 00:29:23,599 that anticipate what you will do now. 792 00:29:23,600 --> 00:29:26,119 Considering these political and economic 793 00:29:26,120 --> 00:29:28,519 forces, it's vital to investigate 794 00:29:28,520 --> 00:29:30,649 how it are targeted to use 795 00:29:30,650 --> 00:29:32,809 it. So these orders that are presented 796 00:29:32,810 --> 00:29:34,699 could be a tool to investigate 797 00:29:34,700 --> 00:29:37,069 personalization as well as 798 00:29:37,070 --> 00:29:39,379 the user profiles of contemporary 799 00:29:39,380 --> 00:29:40,549 surveillance capitalism. 800 00:29:42,050 --> 00:29:44,869 I also believe that a foundation 801 00:29:44,870 --> 00:29:47,059 for machine learning based systems 802 00:29:47,060 --> 00:29:48,049 is needed. 803 00:29:48,050 --> 00:29:50,539 And in the thesis I describe in detail 804 00:29:50,540 --> 00:29:52,969 two different models that could be used. 805 00:29:52,970 --> 00:29:55,759 One is following the German Association 806 00:29:55,760 --> 00:29:57,949 for Technical Inspection the truth? 807 00:29:57,950 --> 00:29:59,839 The other one is following the German 808 00:29:59,840 --> 00:30:01,939 Foundation for product testing. 809 00:30:01,940 --> 00:30:04,099 This all quite stiff compound and 810 00:30:04,100 --> 00:30:06,229 both approaches could be used to make 811 00:30:06,230 --> 00:30:08,299 sure that machine learning systems act in 812 00:30:08,300 --> 00:30:10,369 the interest of society at 813 00:30:10,370 --> 00:30:11,519 large. 814 00:30:11,520 --> 00:30:13,189 To have institutions, for instance, 815 00:30:13,190 --> 00:30:15,439 evaluate each car in Germany every 816 00:30:15,440 --> 00:30:17,509 year to ensure that a car's street 817 00:30:17,510 --> 00:30:18,649 legal. 818 00:30:18,650 --> 00:30:20,659 The purpose of the German Foundation for 819 00:30:20,660 --> 00:30:22,849 product testing is to compare goods 820 00:30:22,850 --> 00:30:25,309 and services in an unbiased way. 821 00:30:25,310 --> 00:30:27,349 So the truth ensures that something 822 00:30:27,350 --> 00:30:29,719 complies with a certain norm, commonly 823 00:30:29,720 --> 00:30:31,579 making binary decisions, whether 824 00:30:31,580 --> 00:30:33,589 something is permitted or not. 825 00:30:33,590 --> 00:30:35,989 The shift to bottom test usually develops 826 00:30:35,990 --> 00:30:38,089 a catalog of criteria used to 827 00:30:38,090 --> 00:30:40,219 compare different instances of a specific 828 00:30:40,220 --> 00:30:41,989 kind of product or service. 829 00:30:41,990 --> 00:30:43,769 So an expert consortium defines these 830 00:30:43,770 --> 00:30:45,349 criteria for specific products or 831 00:30:45,350 --> 00:30:47,869 services and a particular context. 832 00:30:47,870 --> 00:30:50,029 Now, a foundation for machine learning 833 00:30:50,030 --> 00:30:52,759 based systems could adopt these schema 834 00:30:52,760 --> 00:30:55,069 and iteratively develop criteria 835 00:30:55,070 --> 00:30:57,079 for the control of them based curation 836 00:30:57,080 --> 00:30:59,089 systems. Audits could then be used to 837 00:30:59,090 --> 00:31:00,799 make sure that the system is not 838 00:31:00,800 --> 00:31:02,989 affecting a popularity bias, or 839 00:31:02,990 --> 00:31:05,149 that the system is not discriminating 840 00:31:05,150 --> 00:31:07,399 against ethnic minorities 841 00:31:07,400 --> 00:31:09,079 or certain gender identity. 842 00:31:10,530 --> 00:31:12,629 And I really hope that this talk will 843 00:31:12,630 --> 00:31:15,119 inspire other researchers to examine 844 00:31:15,120 --> 00:31:17,399 use this understanding of machine 845 00:31:17,400 --> 00:31:19,229 learning based creation system or other 846 00:31:19,230 --> 00:31:21,299 machine learning systems, and to motivate 847 00:31:21,300 --> 00:31:23,669 them to design and develop novel ways 848 00:31:23,670 --> 00:31:25,769 of explaining and auditing such 849 00:31:25,770 --> 00:31:26,770 system. 850 00:31:27,520 --> 00:31:29,709 But until these bigger 851 00:31:29,710 --> 00:31:31,809 things are established, it's kind of up 852 00:31:31,810 --> 00:31:33,039 to you and me. 853 00:31:33,040 --> 00:31:35,289 So here's my call to civic hackers. 854 00:31:35,290 --> 00:31:36,669 Use the script to investigate the 855 00:31:36,670 --> 00:31:38,769 recommendations and the ads on YouTube. 856 00:31:38,770 --> 00:31:40,779 And he has some ideas you could look at 857 00:31:40,780 --> 00:31:43,299 fake news and pseudoscience related 858 00:31:43,300 --> 00:31:45,759 to climate change or 859 00:31:45,760 --> 00:31:48,489 the COVID 19 pandemic vaccination 860 00:31:48,490 --> 00:31:49,479 in general. 861 00:31:49,480 --> 00:31:51,639 The moon landing conspiracy or 862 00:31:51,640 --> 00:31:53,499 the so-called flat earth theory. 863 00:31:53,500 --> 00:31:55,599 So in the repository, there are two 864 00:31:55,600 --> 00:31:57,489 scripts that I'm providing. 865 00:31:57,490 --> 00:31:59,739 One is called cruel YouTube, 866 00:31:59,740 --> 00:32:02,139 and the other one is called extract data 867 00:32:02,140 --> 00:32:03,349 from downloaded videos 868 00:32:04,570 --> 00:32:06,339 that both Python scripts. 869 00:32:06,340 --> 00:32:08,529 So let's consider the first one called 870 00:32:08,530 --> 00:32:09,530 Cruel YouTube. 871 00:32:12,790 --> 00:32:15,009 As I told you, the goal 872 00:32:15,010 --> 00:32:17,499 is to remote control a web browser 873 00:32:17,500 --> 00:32:19,869 and be using the web testing 874 00:32:19,870 --> 00:32:22,029 library selenium for that. 875 00:32:22,030 --> 00:32:23,829 Selenium is also available in other 876 00:32:23,830 --> 00:32:25,869 programing languages, but I'm using it 877 00:32:25,870 --> 00:32:28,089 here by a python. 878 00:32:28,090 --> 00:32:30,369 And this is based on a Chrome browser. 879 00:32:30,370 --> 00:32:31,719 You can use different problems. 880 00:32:31,720 --> 00:32:34,269 There's also an extension for Firefox 881 00:32:34,270 --> 00:32:35,499 and others. 882 00:32:35,500 --> 00:32:36,819 And in the script, you have different 883 00:32:36,820 --> 00:32:38,719 parameters that you can set. 884 00:32:38,720 --> 00:32:41,169 So here's the number of paths 885 00:32:41,170 --> 00:32:43,299 to collect per keyboard, and I set that 886 00:32:43,300 --> 00:32:44,529 to 20. 887 00:32:44,530 --> 00:32:46,749 Then the number of search 888 00:32:46,750 --> 00:32:49,329 results to consider, which is set to 10, 889 00:32:49,330 --> 00:32:51,159 and the number of related videos to 890 00:32:51,160 --> 00:32:53,319 consider which is set to 10, and the 891 00:32:53,320 --> 00:32:56,139 number of related videos to visit that. 892 00:32:56,140 --> 00:32:57,939 And that's the number of recommendations 893 00:32:57,940 --> 00:33:00,849 we're collecting, and the is a bit weird. 894 00:33:00,850 --> 00:33:03,249 Here are the different keywords 895 00:33:03,250 --> 00:33:05,769 that we're entering into YouTube 896 00:33:05,770 --> 00:33:08,229 to download the recommendations. 897 00:33:08,230 --> 00:33:10,239 And it's quite easy for you to add your 898 00:33:10,240 --> 00:33:12,369 own keywords, so you could just 899 00:33:12,370 --> 00:33:13,370 say. 900 00:33:17,410 --> 00:33:19,749 And then saved the fire. 901 00:33:19,750 --> 00:33:22,569 And that would be sufficient. 902 00:33:22,570 --> 00:33:24,999 I'm going to remove it for now. 903 00:33:25,000 --> 00:33:27,849 If you just want to replicate the same 904 00:33:27,850 --> 00:33:29,919 approach that I showed you in the 905 00:33:29,920 --> 00:33:31,839 paper, then. 906 00:33:31,840 --> 00:33:33,519 And that would be sufficient. 907 00:33:33,520 --> 00:33:36,279 So when running the script, we randomized 908 00:33:36,280 --> 00:33:38,529 to order of the keywords 909 00:33:38,530 --> 00:33:40,959 and then we have the main loop here 910 00:33:40,960 --> 00:33:43,599 where we select for each of the keywords, 911 00:33:43,600 --> 00:33:46,569 we select the number of recommendations 912 00:33:46,570 --> 00:33:48,849 that we specified and the parameters. 913 00:33:48,850 --> 00:33:50,949 And for that, we start a new Chrome 914 00:33:50,950 --> 00:33:52,720 instance and clear all the cookies. 915 00:33:53,750 --> 00:33:55,819 That's what we're doing here, and 916 00:33:55,820 --> 00:33:58,699 then we're taking the keywords 917 00:33:58,700 --> 00:34:00,769 and we're entering them as a search 918 00:34:00,770 --> 00:34:02,390 query to YouTube. 919 00:34:03,730 --> 00:34:06,079 We are opening up the link, 920 00:34:06,080 --> 00:34:07,799 the URL. 921 00:34:07,800 --> 00:34:08,789 And we're waiting a bit. 922 00:34:08,790 --> 00:34:10,619 And the reason for that is because 923 00:34:10,620 --> 00:34:12,779 YouTube is dynamically loading a lot of 924 00:34:12,780 --> 00:34:14,638 the videos. 925 00:34:14,639 --> 00:34:16,799 So when the web browser 926 00:34:16,800 --> 00:34:19,198 finished loading, they're still loading 927 00:34:19,199 --> 00:34:21,359 going on in the background where a lot of 928 00:34:21,360 --> 00:34:23,158 data is retrieved and that's what we're 929 00:34:23,159 --> 00:34:25,049 waiting for here. 930 00:34:25,050 --> 00:34:27,569 And I just basically wait 931 00:34:27,570 --> 00:34:29,789 until the browser knows 932 00:34:29,790 --> 00:34:32,069 there's an element called comments, 933 00:34:32,070 --> 00:34:34,379 and he knows that by the it's an element 934 00:34:34,380 --> 00:34:36,388 of the ID comments. 935 00:34:36,389 --> 00:34:38,249 And then we're preparing a file name 936 00:34:38,250 --> 00:34:40,709 because we can't just save up to the 937 00:34:40,710 --> 00:34:42,569 file system. 938 00:34:42,570 --> 00:34:44,849 And if 939 00:34:44,850 --> 00:34:46,678 if we haven't already visited that 940 00:34:46,679 --> 00:34:48,899 website, we're going to write that down. 941 00:34:48,900 --> 00:34:50,968 We're writing the entire source of 942 00:34:50,969 --> 00:34:52,669 the website. 943 00:34:52,670 --> 00:34:55,488 And then we're collecting the top end 944 00:34:55,489 --> 00:34:56,489 recommendations. 945 00:34:57,800 --> 00:34:59,599 And then we're selecting a random video 946 00:34:59,600 --> 00:35:01,669 from these recommendations, and then 947 00:35:01,670 --> 00:35:03,859 we're following the recommendations up to 948 00:35:03,860 --> 00:35:05,149 a certain depth. 949 00:35:05,150 --> 00:35:07,249 And it's always the same procedure 950 00:35:07,250 --> 00:35:09,439 will open the website awaiting 951 00:35:09,440 --> 00:35:11,359 for a random amount of time. 952 00:35:11,360 --> 00:35:12,859 We're waiting until we can see the 953 00:35:12,860 --> 00:35:15,049 comments and then we're saving 954 00:35:15,050 --> 00:35:17,119 the path and we're finding 955 00:35:17,120 --> 00:35:19,339 one of the recommendations and selecting 956 00:35:19,340 --> 00:35:21,259 one of the recommendations. 957 00:35:21,260 --> 00:35:22,819 And it's quite nice because we can just 958 00:35:22,820 --> 00:35:25,189 use the CSC classes 959 00:35:25,190 --> 00:35:27,289 to find certain elements in the 960 00:35:27,290 --> 00:35:29,509 website based on the idea 961 00:35:29,510 --> 00:35:30,800 and based on the class. 962 00:35:31,870 --> 00:35:34,419 And after that, we're not only saving 963 00:35:34,420 --> 00:35:36,219 each of the individual videos that we're 964 00:35:36,220 --> 00:35:38,349 visiting, but also the 965 00:35:38,350 --> 00:35:39,249 path. 966 00:35:39,250 --> 00:35:41,379 So which video led to 967 00:35:41,380 --> 00:35:43,449 another and we're saving that to a fire 968 00:35:43,450 --> 00:35:46,179 cord crawl under psychopaths? 969 00:35:47,910 --> 00:35:50,609 So if you were to add that this could, 970 00:35:50,610 --> 00:35:52,829 the easiest way would be 971 00:35:52,830 --> 00:35:54,300 to edit your own keywords. 972 00:35:55,900 --> 00:35:58,029 And, of course, to change the parameters. 973 00:35:58,030 --> 00:36:00,639 But you can easily adapt this code also 974 00:36:00,640 --> 00:36:03,669 to visit other websites like Instagram, 975 00:36:03,670 --> 00:36:06,399 like Telegram and 976 00:36:06,400 --> 00:36:08,589 then collect data through this 977 00:36:08,590 --> 00:36:09,590 mechanism. 978 00:36:11,150 --> 00:36:14,149 So the system selected education 979 00:36:14,150 --> 00:36:17,060 policies, and it's now downloading. 980 00:36:18,850 --> 00:36:19,960 The different videos. 981 00:36:22,710 --> 00:36:25,019 And I'm stopping it here to show 982 00:36:25,020 --> 00:36:26,670 you the downloaded videos. 983 00:36:28,990 --> 00:36:31,989 And I do that with typing control. 984 00:36:31,990 --> 00:36:32,990 See? 985 00:36:33,980 --> 00:36:36,219 Let's have a look at the source code 986 00:36:36,220 --> 00:36:38,050 of one of the videos that we downloaded. 987 00:36:47,120 --> 00:36:49,309 And you can see that this is really the 988 00:36:49,310 --> 00:36:51,499 whole HTML document, 989 00:36:51,500 --> 00:36:53,569 including all the CIUSSS 990 00:36:53,570 --> 00:36:56,749 and the JavaScript, and you can. 991 00:36:56,750 --> 00:37:00,049 I find a variety of things 992 00:37:00,050 --> 00:37:01,609 in the in the data. 993 00:37:01,610 --> 00:37:03,049 For instance, if we look for the video 994 00:37:03,050 --> 00:37:05,299 title, we find success for that 995 00:37:05,300 --> 00:37:06,300 video title. 996 00:37:12,390 --> 00:37:14,879 But we also find the actual video total. 997 00:37:14,880 --> 00:37:17,789 And that's how our education system 998 00:37:17,790 --> 00:37:20,669 is embarrassing itself. 999 00:37:20,670 --> 00:37:21,839 And that's really what we're doing 1000 00:37:21,840 --> 00:37:23,399 programmatically, right, I and we have 1001 00:37:23,400 --> 00:37:25,469 the HTML and then we use 1002 00:37:25,470 --> 00:37:27,779 that to extract 1003 00:37:27,780 --> 00:37:30,029 certain information and also provide 1004 00:37:30,030 --> 00:37:31,829 a script to help you with that. 1005 00:37:33,420 --> 00:37:35,729 And that's a script extract data 1006 00:37:35,730 --> 00:37:36,730 from downloaded. 1007 00:37:37,910 --> 00:37:38,910 Videos. 1008 00:37:39,750 --> 00:37:41,549 So here we're using a Python library 1009 00:37:41,550 --> 00:37:43,110 called Beautiful Soup. 1010 00:37:44,200 --> 00:37:46,629 That allows you to pass HTML 1011 00:37:46,630 --> 00:37:49,539 and to be able to search the extent 1012 00:37:49,540 --> 00:37:50,589 efficiently. 1013 00:37:50,590 --> 00:37:52,449 So what we're doing here is we're looping 1014 00:37:52,450 --> 00:37:55,239 over all the videos that we downloaded 1015 00:37:55,240 --> 00:37:57,759 and then we're parsing the HTML 1016 00:37:57,760 --> 00:37:59,589 that we downloaded. 1017 00:37:59,590 --> 00:38:01,059 So what you can see here is with 1018 00:38:01,060 --> 00:38:03,339 selecting the number of views 1019 00:38:03,340 --> 00:38:05,979 of a video based on the 1020 00:38:05,980 --> 00:38:08,169 ID info text class 1021 00:38:08,170 --> 00:38:09,129 you can't. 1022 00:38:09,130 --> 00:38:11,589 So how do I know where the views are? 1023 00:38:11,590 --> 00:38:13,749 Well, it's quite simple because I just 1024 00:38:13,750 --> 00:38:15,159 looked at the source code. 1025 00:38:15,160 --> 00:38:17,139 So if we find a video that we're 1026 00:38:17,140 --> 00:38:18,140 interested in. 1027 00:38:20,090 --> 00:38:22,969 Like this wonderful talk by Dave Keyser, 1028 00:38:22,970 --> 00:38:25,249 we just look at the CSS 1029 00:38:25,250 --> 00:38:27,409 selectors by right clicking, 1030 00:38:27,410 --> 00:38:29,569 clicking inspect and cruel, but it's 1031 00:38:29,570 --> 00:38:31,519 the same in all the browsers. 1032 00:38:31,520 --> 00:38:33,589 So we have a span with 1033 00:38:33,590 --> 00:38:35,779 the class view count and 1034 00:38:35,780 --> 00:38:38,119 that's within a diff that's 1035 00:38:38,120 --> 00:38:40,099 called in full text. 1036 00:38:40,100 --> 00:38:41,149 And based on that? 1037 00:38:41,150 --> 00:38:42,439 Now going back to the source. 1038 00:38:44,440 --> 00:38:45,729 We're selecting the count, and we're 1039 00:38:45,730 --> 00:38:47,499 taking the first ones because there's 1040 00:38:47,500 --> 00:38:49,599 usually just one and we do 1041 00:38:49,600 --> 00:38:51,279 this for all the different things, for 1042 00:38:51,280 --> 00:38:52,839 instance, the date on which the video was 1043 00:38:52,840 --> 00:38:54,909 posted. The name of the channel. 1044 00:38:54,910 --> 00:38:56,829 The number of subscribers. 1045 00:38:56,830 --> 00:38:59,349 And as you can also see here, 1046 00:38:59,350 --> 00:39:01,509 it's a bit more tricky to get the 1047 00:39:01,510 --> 00:39:04,449 likes and dislikes. 1048 00:39:04,450 --> 00:39:06,249 But you can have a look at the code on 1049 00:39:06,250 --> 00:39:08,469 your own to figure out it's not really 1050 00:39:08,470 --> 00:39:09,999 rocket science, either. 1051 00:39:10,000 --> 00:39:12,159 So here are the references of the 1052 00:39:12,160 --> 00:39:13,669 paper. 1053 00:39:13,670 --> 00:39:15,769 I mentioned quite a large number 1054 00:39:15,770 --> 00:39:16,770 of papers. 1055 00:39:17,570 --> 00:39:19,699 So just quickly going to scroll through 1056 00:39:19,700 --> 00:39:22,579 them and giving you a chance to 1057 00:39:22,580 --> 00:39:24,260 stop the video to look at them. 1058 00:39:29,530 --> 00:39:31,299 And I again, invite you to have a look at 1059 00:39:31,300 --> 00:39:33,419 my doctoral thesis uses 1060 00:39:33,420 --> 00:39:35,739 machine learning based curation systems. 1061 00:39:35,740 --> 00:39:37,089 Thank you very much for your attention. 1062 00:39:41,710 --> 00:39:44,649 OK. Thank you for your interesting 1063 00:39:44,650 --> 00:39:46,329 talk, Hendrik. 1064 00:39:46,330 --> 00:39:48,519 And now 1065 00:39:48,520 --> 00:39:51,039 you have the option 1066 00:39:51,040 --> 00:39:53,320 to tell us some question 1067 00:39:55,030 --> 00:39:57,309 used to FC 1068 00:39:57,310 --> 00:39:59,439 three, CW 1069 00:39:59,440 --> 00:40:02,079 or hashtag on social media 1070 00:40:02,080 --> 00:40:04,689 or to your featured 1071 00:40:04,690 --> 00:40:06,730 to do so, and 1072 00:40:08,140 --> 00:40:10,899 there already are some question 1073 00:40:10,900 --> 00:40:12,419 and wish. 1074 00:40:12,420 --> 00:40:14,559 And the issue is, can 1075 00:40:14,560 --> 00:40:16,749 you please provide the link to 1076 00:40:16,750 --> 00:40:17,750 the slides? 1077 00:40:18,650 --> 00:40:20,479 Sure, yeah, I can upload them, I just put 1078 00:40:20,480 --> 00:40:22,039 them in the repository, I think there's a 1079 00:40:22,040 --> 00:40:24,769 link to the repository in 1080 00:40:24,770 --> 00:40:26,899 the video, and I can definitely provide 1081 00:40:26,900 --> 00:40:27,799 the slides. 1082 00:40:27,800 --> 00:40:29,119 Excellent. 1083 00:40:29,120 --> 00:40:31,699 OK, then the first question 1084 00:40:31,700 --> 00:40:33,829 how would you 1085 00:40:33,830 --> 00:40:35,959 want a platform like YouTube to 1086 00:40:35,960 --> 00:40:38,179 make minority views 1087 00:40:38,180 --> 00:40:40,369 more attractive without 1088 00:40:40,370 --> 00:40:42,649 also advert 1089 00:40:42,650 --> 00:40:45,199 advertising seminars, small 1090 00:40:45,200 --> 00:40:47,749 extremist views? 1091 00:40:47,750 --> 00:40:49,289 Very, very good question. 1092 00:40:49,290 --> 00:40:51,379 I think my main the main idea 1093 00:40:51,380 --> 00:40:53,479 behind the talk and also behind a 1094 00:40:53,480 --> 00:40:55,669 lot of the other research that I'm doing 1095 00:40:55,670 --> 00:40:58,339 is to give people more control. 1096 00:40:58,340 --> 00:41:00,139 And it kind of starts even just with the 1097 00:41:00,140 --> 00:41:02,239 knowledge that these recommendations 1098 00:41:02,240 --> 00:41:03,619 are selected by a machine learning 1099 00:41:03,620 --> 00:41:06,079 system, and they're selected 1100 00:41:06,080 --> 00:41:08,419 with a particular rule, right? 1101 00:41:08,420 --> 00:41:09,769 And that's the first step. 1102 00:41:09,770 --> 00:41:11,239 So everybody knows I'm seeing the 1103 00:41:11,240 --> 00:41:13,519 recommendations because there's a system 1104 00:41:13,520 --> 00:41:15,919 that's actually like Amazon trying 1105 00:41:15,920 --> 00:41:17,959 to find things that are similar to what 1106 00:41:17,960 --> 00:41:20,119 I've done in the past and just trying to 1107 00:41:20,120 --> 00:41:22,009 like, show me stuff that's similar to 1108 00:41:22,010 --> 00:41:23,449 what I've done in the past. 1109 00:41:23,450 --> 00:41:25,459 And I think that understanding is the the 1110 00:41:25,460 --> 00:41:27,829 first and maybe the most important 1111 00:41:27,830 --> 00:41:29,689 step. But the second step would then be 1112 00:41:29,690 --> 00:41:31,969 also to give people control over 1113 00:41:31,970 --> 00:41:33,349 what they're seeing. 1114 00:41:33,350 --> 00:41:35,659 And that would be 1115 00:41:35,660 --> 00:41:38,269 to just give them more tools 1116 00:41:38,270 --> 00:41:40,609 and more configuration settings 1117 00:41:40,610 --> 00:41:42,889 to decide what recommendations 1118 00:41:42,890 --> 00:41:44,959 they want to see and in what context they 1119 00:41:44,960 --> 00:41:46,789 want to see them. 1120 00:41:46,790 --> 00:41:49,129 And I think the the second question 1121 00:41:49,130 --> 00:41:50,780 of the more extremist as in like 1122 00:41:52,070 --> 00:41:54,139 far right extremist, I think that's 1123 00:41:54,140 --> 00:41:55,729 a that's a different issue in a way 1124 00:41:55,730 --> 00:41:57,919 that's kind of policing what's uploaded. 1125 00:41:57,920 --> 00:42:00,109 And that's a bit like 1126 00:42:00,110 --> 00:42:02,149 it's orthogonal to what I'm talking 1127 00:42:02,150 --> 00:42:04,279 about, but I think this is more related 1128 00:42:04,280 --> 00:42:06,349 to actually making sure that people 1129 00:42:06,350 --> 00:42:08,179 like them conscious. Apple thinks this is 1130 00:42:08,180 --> 00:42:09,769 not it shouldn't be on the platform in 1131 00:42:09,770 --> 00:42:10,859 the first place. 1132 00:42:10,860 --> 00:42:12,499 Right? So that's not a recommendation 1133 00:42:12,500 --> 00:42:14,629 system issue policy, but very 1134 00:42:14,630 --> 00:42:15,630 good question. Thank you. 1135 00:42:16,590 --> 00:42:17,590 OK. Thanks. 1136 00:42:19,300 --> 00:42:20,880 So next question, 1137 00:42:21,900 --> 00:42:24,029 what do you think about the recent 1138 00:42:24,030 --> 00:42:26,279 cat made in Germany 1139 00:42:26,280 --> 00:42:28,679 initiative that aim 1140 00:42:28,680 --> 00:42:30,690 to track the little responsible? 1141 00:42:34,050 --> 00:42:35,789 To be honest, I don't know much about it, 1142 00:42:35,790 --> 00:42:37,559 so I really can't comment on it. 1143 00:42:37,560 --> 00:42:39,699 I think the idea is in like having 1144 00:42:39,700 --> 00:42:42,209 like responsibility. I am all for that. 1145 00:42:42,210 --> 00:42:44,099 But it's something that's really, really 1146 00:42:44,100 --> 00:42:45,119 hard to do. 1147 00:42:45,120 --> 00:42:46,619 And I think a lot of people are working 1148 00:42:46,620 --> 00:42:47,879 on this actively. 1149 00:42:47,880 --> 00:42:49,979 But yeah, so the more the merrier. 1150 00:42:49,980 --> 00:42:51,119 So very much welcome that. 1151 00:42:51,120 --> 00:42:53,369 But I can't comment on that particular 1152 00:42:53,370 --> 00:42:54,610 because I don't know what that one. 1153 00:42:57,140 --> 00:42:58,140 OK. 1154 00:42:58,550 --> 00:43:00,889 And do you 1155 00:43:00,890 --> 00:43:03,259 also run audits 1156 00:43:03,260 --> 00:43:05,929 in which you choose to videos 1157 00:43:05,930 --> 00:43:08,209 of the same topic before and 1158 00:43:08,210 --> 00:43:10,519 picking then the rest of 1159 00:43:10,520 --> 00:43:11,839 randomly? 1160 00:43:11,840 --> 00:43:13,009 I didn't get that fully. 1161 00:43:13,010 --> 00:43:13,699 What do you mean? 1162 00:43:13,700 --> 00:43:16,249 Like, uh, you 1163 00:43:16,250 --> 00:43:18,919 you you talked about 1164 00:43:18,920 --> 00:43:21,259 how you do, you do the audits? 1165 00:43:21,260 --> 00:43:23,509 And the question is, did you 1166 00:43:23,510 --> 00:43:25,669 also run audits in which 1167 00:43:25,670 --> 00:43:27,769 you choose to reduce of the same 1168 00:43:27,770 --> 00:43:30,019 topic topic before 1169 00:43:30,020 --> 00:43:33,379 picking the rest of them randomly? 1170 00:43:33,380 --> 00:43:33,799 No, I haven't 1171 00:43:33,800 --> 00:43:35,719 done that yet, but I think it's something 1172 00:43:35,720 --> 00:43:36,859 that's worth doing. 1173 00:43:36,860 --> 00:43:38,449 I think the most important thing that I 1174 00:43:38,450 --> 00:43:40,429 really want to do in terms of what with 1175 00:43:40,430 --> 00:43:41,989 the audience is understanding 1176 00:43:41,990 --> 00:43:43,249 personalization. 1177 00:43:43,250 --> 00:43:44,239 So it fun. 1178 00:43:44,240 --> 00:43:46,339 I'm not sure when I created 1179 00:43:46,340 --> 00:43:48,499 my YouTube or Google account, but it's 1180 00:43:48,500 --> 00:43:50,269 been years that ten years or something, 1181 00:43:50,270 --> 00:43:52,249 right? So they really know a lot about 1182 00:43:52,250 --> 00:43:54,349 me, and I think 1183 00:43:54,350 --> 00:43:56,629 nobody really understands yet 1184 00:43:56,630 --> 00:43:59,479 how that influences the recommendations. 1185 00:43:59,480 --> 00:44:00,949 And I think it would be really 1186 00:44:00,950 --> 00:44:03,199 interesting if people with their very 1187 00:44:03,200 --> 00:44:04,939 old accounts, not just accounts they 1188 00:44:04,940 --> 00:44:06,859 created a week ago, but thinks they've 1189 00:44:06,860 --> 00:44:08,869 used for years and years start to do 1190 00:44:08,870 --> 00:44:11,059 these audits relating to the topics that 1191 00:44:11,060 --> 00:44:13,129 I presented here, but also related to 1192 00:44:13,130 --> 00:44:14,509 more urgent topics. 1193 00:44:14,510 --> 00:44:16,309 For instance, you can just go to the Ivy 1194 00:44:16,310 --> 00:44:18,409 Deutschland fund, which every 1195 00:44:18,410 --> 00:44:21,049 now and then is asking people in Germany 1196 00:44:21,050 --> 00:44:22,879 what's interesting? It's Representative 1197 00:44:22,880 --> 00:44:25,549 Paul by the idea of the video, 1198 00:44:25,550 --> 00:44:28,009 and you can then use these topics to see 1199 00:44:28,010 --> 00:44:30,109 what what people are or might be 1200 00:44:30,110 --> 00:44:32,059 googling or might be looking for on 1201 00:44:32,060 --> 00:44:33,060 YouTube. 1202 00:44:34,280 --> 00:44:35,280 OK. Yeah. 1203 00:44:38,590 --> 00:44:40,659 While doing that, aren't there any 1204 00:44:40,660 --> 00:44:42,039 limits? 1205 00:44:42,040 --> 00:44:44,139 YouTube imposes your 1206 00:44:44,140 --> 00:44:45,159 own holding. 1207 00:44:46,480 --> 00:44:47,739 Yeah, I mean, that's the thing. 1208 00:44:47,740 --> 00:44:49,899 I mean, there's there's different ways of 1209 00:44:49,900 --> 00:44:50,829 doing this. 1210 00:44:50,830 --> 00:44:53,109 And I think like I commented 1211 00:44:53,110 --> 00:44:55,269 on this, if it all works, which 1212 00:44:55,270 --> 00:44:57,639 we're using, the YouTube API 1213 00:44:57,640 --> 00:44:59,169 and the YouTube API has very clear 1214 00:44:59,170 --> 00:45:00,909 limitations on what you can do and what 1215 00:45:00,910 --> 00:45:03,549 you can't do. But I edit them 1216 00:45:03,550 --> 00:45:05,469 remote controlling the browser, so I 1217 00:45:05,470 --> 00:45:07,599 would. So there are no natural limits. 1218 00:45:07,600 --> 00:45:09,489 Of course, you will be blocked if you're 1219 00:45:09,490 --> 00:45:10,929 too eager, let's say. 1220 00:45:10,930 --> 00:45:12,609 So you should be responsible and you 1221 00:45:12,610 --> 00:45:15,479 should have delays every now and then 1222 00:45:15,480 --> 00:45:16,689 and take some time. 1223 00:45:16,690 --> 00:45:18,789 But there's no technical 1224 00:45:18,790 --> 00:45:21,009 limit, right? But it would be nice 1225 00:45:21,010 --> 00:45:23,439 and don't overload 1226 00:45:23,440 --> 00:45:24,440 the service. 1227 00:45:26,260 --> 00:45:29,529 OK, now we have the question, 1228 00:45:29,530 --> 00:45:30,769 gentlemen. 1229 00:45:30,770 --> 00:45:33,069 And Ithink a cookies lesson 1230 00:45:33,070 --> 00:45:35,439 that tarnished Krishnan unfun 1231 00:45:35,440 --> 00:45:37,359 column fund regulation appear excessive. 1232 00:45:37,360 --> 00:45:39,219 It seems like an awesome place, an IP 1233 00:45:39,220 --> 00:45:42,909 range. Even so, is Dennis 1234 00:45:42,910 --> 00:45:44,469 even suggesting its request, 1235 00:45:46,210 --> 00:45:48,309 even though is that in his request 1236 00:45:48,310 --> 00:45:50,169 relevant? What are these folks? 1237 00:45:52,300 --> 00:45:53,679 Let's just repeat it for the other 1238 00:45:53,680 --> 00:45:55,839 people. You must be on any speakers. 1239 00:45:55,840 --> 00:45:57,549 So the question is what about cookies? 1240 00:45:57,550 --> 00:45:59,709 And what about so like 1241 00:45:59,710 --> 00:46:01,509 that? Just deleting the cookies is not 1242 00:46:01,510 --> 00:46:03,279 sufficient because you're coming from the 1243 00:46:03,280 --> 00:46:05,169 same IP address and there's a lot of 1244 00:46:05,170 --> 00:46:07,299 things that are like really 1245 00:46:07,300 --> 00:46:08,679 limiting you here. 1246 00:46:08,680 --> 00:46:10,809 And it's just one limitations I have to 1247 00:46:10,810 --> 00:46:12,519 live with in this particular order. 1248 00:46:12,520 --> 00:46:14,679 I did not do that, and I did that in 1249 00:46:14,680 --> 00:46:16,899 May 20 and 19 quite 1250 00:46:16,900 --> 00:46:19,039 recently after the cabinets 1251 00:46:19,040 --> 00:46:21,129 incident. But that's also kind 1252 00:46:21,130 --> 00:46:22,839 of my idea of releasing the script 1253 00:46:22,840 --> 00:46:25,119 because there's so many things that 1254 00:46:25,120 --> 00:46:27,129 have an influence on the recommendations, 1255 00:46:27,130 --> 00:46:29,649 at least have a potential influence that 1256 00:46:29,650 --> 00:46:31,749 we really need a lot of people to 1257 00:46:31,750 --> 00:46:34,089 do audit studies to really get an idea. 1258 00:46:34,090 --> 00:46:35,529 And in a way, that's why I want other 1259 00:46:35,530 --> 00:46:37,569 people to be able to do these kind of 1260 00:46:37,570 --> 00:46:40,089 things because I wholeheartedly agree 1261 00:46:40,090 --> 00:46:41,859 with the comment and I think that the IP 1262 00:46:41,860 --> 00:46:43,679 makes a difference. I mean, I know kind 1263 00:46:43,680 --> 00:46:45,849 of scientifically said, OK, these are 1264 00:46:45,850 --> 00:46:47,409 the limitations of our approach. 1265 00:46:47,410 --> 00:46:48,309 This is what we know. 1266 00:46:48,310 --> 00:46:50,919 This is what we controlled for. 1267 00:46:50,920 --> 00:46:53,379 But but yeah, probably it has an 1268 00:46:53,380 --> 00:46:55,479 influence, but we don't know. 1269 00:46:55,480 --> 00:46:57,459 So I think we just need a lot of that. 1270 00:46:57,460 --> 00:46:59,529 If people kind of Wikipedia style to 1271 00:46:59,530 --> 00:47:01,689 to to go about these problems, 1272 00:47:01,690 --> 00:47:02,690 about this problem. 1273 00:47:04,110 --> 00:47:05,549 OK, thanks. 1274 00:47:05,550 --> 00:47:06,550 Then 1275 00:47:08,280 --> 00:47:10,559 the are a lot of question, how 1276 00:47:10,560 --> 00:47:12,779 how you did this, 1277 00:47:12,780 --> 00:47:15,299 like did you or 1278 00:47:15,300 --> 00:47:17,609 how you two acts in doing 1279 00:47:17,610 --> 00:47:19,739 this and maybe did 1280 00:47:19,740 --> 00:47:22,259 you run into any counter measurements 1281 00:47:22,260 --> 00:47:24,989 against automation 1282 00:47:24,990 --> 00:47:25,990 on YouTube? 1283 00:47:27,270 --> 00:47:28,769 No, actually not. 1284 00:47:28,770 --> 00:47:30,989 I think that's in a way why this is such 1285 00:47:30,990 --> 00:47:33,149 such a nice hack in 1286 00:47:33,150 --> 00:47:35,339 that like we just use selenium and a lot 1287 00:47:35,340 --> 00:47:37,469 of people use selenium as part of 1288 00:47:37,470 --> 00:47:38,729 the web testing. 1289 00:47:38,730 --> 00:47:40,979 It's a very, very widely 1290 00:47:40,980 --> 00:47:41,939 used tool, right? 1291 00:47:41,940 --> 00:47:43,439 It can be part of your continuous 1292 00:47:43,440 --> 00:47:46,019 integration workflow, just making sure 1293 00:47:46,020 --> 00:47:47,939 that there are certain things about your 1294 00:47:47,940 --> 00:47:49,109 website's work. 1295 00:47:49,110 --> 00:47:50,819 And it's quite hard to block in a way 1296 00:47:50,820 --> 00:47:53,639 because it's an extra Firefox. 1297 00:47:53,640 --> 00:47:56,099 It's it's not just trying to be Firefox, 1298 00:47:56,100 --> 00:47:57,100 it is Firefox. 1299 00:47:58,080 --> 00:48:00,149 And of course, you can like if you 1300 00:48:00,150 --> 00:48:02,369 look at the behavior of the user and 1301 00:48:02,370 --> 00:48:03,809 it's quite artificial in that way and you 1302 00:48:03,810 --> 00:48:05,099 probably can detect it. 1303 00:48:05,100 --> 00:48:07,019 But when we did this, we didn't see 1304 00:48:07,020 --> 00:48:08,969 anything. I mean, you definitely need to 1305 00:48:08,970 --> 00:48:11,069 look at the legal situation and make sure 1306 00:48:11,070 --> 00:48:13,169 that you don't the and obey with all the 1307 00:48:13,170 --> 00:48:14,310 laws. But 1308 00:48:15,710 --> 00:48:17,249 for this academic purposes that we've 1309 00:48:17,250 --> 00:48:18,719 been doing this, we didn't see any 1310 00:48:18,720 --> 00:48:19,720 problems. 1311 00:48:24,530 --> 00:48:26,719 Next question, you two might be 1312 00:48:26,720 --> 00:48:29,029 using several different animal 1313 00:48:29,030 --> 00:48:31,099 Rothman and keep 1314 00:48:31,100 --> 00:48:32,479 changing their tech. 1315 00:48:33,860 --> 00:48:36,109 How would your research 1316 00:48:36,110 --> 00:48:37,110 address this? 1317 00:48:38,570 --> 00:48:40,139 Again, this is limited in a way. 1318 00:48:40,140 --> 00:48:41,599 I mean, we did this at this particular 1319 00:48:41,600 --> 00:48:43,969 time and with this particular 1320 00:48:43,970 --> 00:48:46,329 purpose. But that's also one source 1321 00:48:46,330 --> 00:48:48,079 also open sourcing it, right? 1322 00:48:48,080 --> 00:48:49,789 I want other people to look at this, and 1323 00:48:49,790 --> 00:48:51,739 it's known that there is a lot of AB test 1324 00:48:51,740 --> 00:48:53,899 and there's probably dozens 1325 00:48:53,900 --> 00:48:55,669 of versions of YouTube running at the 1326 00:48:55,670 --> 00:48:57,859 same time targeting different people. 1327 00:48:57,860 --> 00:48:59,599 But in a way that's just showing how 1328 00:48:59,600 --> 00:49:01,489 important this kind of research is 1329 00:49:01,490 --> 00:49:02,809 because again, like it's billions of 1330 00:49:02,810 --> 00:49:04,879 people using this, 70 percent 1331 00:49:04,880 --> 00:49:07,009 of the videos watched recommended by the 1332 00:49:07,010 --> 00:49:09,109 algorithms, and we don't know shit about 1333 00:49:09,110 --> 00:49:11,239 it, to be blunt, anyway. 1334 00:49:11,240 --> 00:49:13,609 So, yeah, I think just take the script 1335 00:49:13,610 --> 00:49:15,530 and then have many people do this. 1336 00:49:16,710 --> 00:49:17,939 Yes. Yeah. 1337 00:49:17,940 --> 00:49:19,119 Thank you. And. 1338 00:49:20,900 --> 00:49:22,309 Yeah, there are. 1339 00:49:24,710 --> 00:49:26,570 Oh, that's a new question and 1340 00:49:27,620 --> 00:49:29,869 why it's crawling with chrome, 1341 00:49:29,870 --> 00:49:32,119 no problem, there's 1342 00:49:32,120 --> 00:49:33,799 no money. 1343 00:49:33,800 --> 00:49:36,199 Surfing with a Tor browser is 1344 00:49:36,200 --> 00:49:37,279 a constant. 1345 00:49:37,280 --> 00:49:39,709 It's constantly 1346 00:49:39,710 --> 00:49:40,710 blocked. 1347 00:49:41,150 --> 00:49:42,229 It's Tobruk. 1348 00:49:42,230 --> 00:49:43,249 What's the what's the question? 1349 00:49:46,790 --> 00:49:48,829 Surfing with a Chrome browser is no 1350 00:49:48,830 --> 00:49:51,709 problem, but using Tor browser 1351 00:49:51,710 --> 00:49:53,479 is most of the time 1352 00:49:54,500 --> 00:49:56,929 problem because there are captures 1353 00:49:56,930 --> 00:49:59,449 and verify and 1354 00:49:59,450 --> 00:50:01,639 it's constantly blocked. 1355 00:50:01,640 --> 00:50:03,109 I have no idea, to be honest. 1356 00:50:03,110 --> 00:50:05,389 I know that you can use selenium of Tor 1357 00:50:05,390 --> 00:50:07,309 and also know that it can be interesting 1358 00:50:07,310 --> 00:50:09,139 because you can get different endpoints, 1359 00:50:09,140 --> 00:50:11,449 of course, so that can be quite useful. 1360 00:50:11,450 --> 00:50:13,639 But what actually YouTube is 1361 00:50:13,640 --> 00:50:16,279 doing to prevent people from Tor and why? 1362 00:50:16,280 --> 00:50:17,280 I don't know. 1363 00:50:19,580 --> 00:50:20,509 Yeah. 1364 00:50:20,510 --> 00:50:22,579 Then there are some people in 1365 00:50:22,580 --> 00:50:24,709 the it's 1366 00:50:24,710 --> 00:50:26,839 telling you that it's a 1367 00:50:26,840 --> 00:50:28,339 really nice talk. 1368 00:50:28,340 --> 00:50:30,949 Thank you very much for that. 1369 00:50:30,950 --> 00:50:31,950 You're welcome. 1370 00:50:33,950 --> 00:50:36,089 Um, yeah, 1371 00:50:36,090 --> 00:50:38,210 are there any question 1372 00:50:39,380 --> 00:50:41,899 for this Q&A session 1373 00:50:41,900 --> 00:50:42,900 left? 1374 00:50:43,480 --> 00:50:46,089 You can go to the sea 1375 00:50:46,090 --> 00:50:47,090 chat. 1376 00:50:48,250 --> 00:50:50,889 It's linked in the streaming 1377 00:50:50,890 --> 00:50:53,109 media khedive 1378 00:50:53,110 --> 00:50:55,359 page or do it on 1379 00:50:55,360 --> 00:50:56,360 social media. 1380 00:50:59,340 --> 00:51:02,339 And I put the slides and the repository, 1381 00:51:02,340 --> 00:51:04,169 I can just look at it and people can find 1382 00:51:04,170 --> 00:51:06,299 it. And yeah, especially if you it's a 1383 00:51:06,300 --> 00:51:08,609 media science students and the like, 1384 00:51:08,610 --> 00:51:10,679 you see scripts and then do exciting 1385 00:51:10,680 --> 00:51:11,669 stuff because there's a lot of 1386 00:51:11,670 --> 00:51:14,069 interesting research on Twitter, 1387 00:51:14,070 --> 00:51:15,599 for instance, because it's quite easy to 1388 00:51:15,600 --> 00:51:16,919 do this kind of research. 1389 00:51:16,920 --> 00:51:19,019 And I really hope in a way with releasing 1390 00:51:19,020 --> 00:51:21,629 the scripts that 1391 00:51:21,630 --> 00:51:23,909 I make researching YouTube a bit easier 1392 00:51:23,910 --> 00:51:26,099 and hopefully also instrumental, because 1393 00:51:26,100 --> 00:51:28,019 then in principle, it's really the same 1394 00:51:28,020 --> 00:51:30,449 in a way, you just toss in a while 1395 00:51:30,450 --> 00:51:32,579 and and look for the HTML 1396 00:51:32,580 --> 00:51:33,719 elements like I showed. 1397 00:51:33,720 --> 00:51:35,999 You need to know manner between Python 1398 00:51:36,000 --> 00:51:37,960 a bit, but you can only do so much. 1399 00:51:39,750 --> 00:51:41,939 OK, a new question 1400 00:51:41,940 --> 00:51:44,279 up yet doesn't 1401 00:51:44,280 --> 00:51:46,649 randomly choosing a recommended 1402 00:51:46,650 --> 00:51:49,709 video has its own bias 1403 00:51:49,710 --> 00:51:52,199 in. This may prevent 1404 00:51:52,200 --> 00:51:54,719 the other angle 1405 00:51:54,720 --> 00:51:57,509 from learning a user preference 1406 00:51:57,510 --> 00:51:59,679 and on a rabbit hole. 1407 00:51:59,680 --> 00:52:00,839 Yeah, very good question. 1408 00:52:00,840 --> 00:52:02,999 And definitely it has an effect. 1409 00:52:04,530 --> 00:52:07,229 I reflect on that in the thesis, and 1410 00:52:07,230 --> 00:52:09,149 in a way it's a conscious decision, 1411 00:52:09,150 --> 00:52:11,279 right? I mean, you can do a lot of 1412 00:52:11,280 --> 00:52:13,709 different things, and 1413 00:52:13,710 --> 00:52:16,289 that's just the one way I try to, which 1414 00:52:16,290 --> 00:52:18,389 I thought is the most interesting way 1415 00:52:18,390 --> 00:52:20,129 that I can do now. 1416 00:52:20,130 --> 00:52:22,949 But it definitely has an effect and 1417 00:52:22,950 --> 00:52:25,259 definitely is also quite different from 1418 00:52:25,260 --> 00:52:27,779 a human being on the website, 1419 00:52:27,780 --> 00:52:28,919 right? 1420 00:52:28,920 --> 00:52:30,989 But again, that's I think why this is 1421 00:52:30,990 --> 00:52:32,429 only one small 1422 00:52:33,690 --> 00:52:36,149 piece of the puzzle, and we definitely 1423 00:52:36,150 --> 00:52:37,150 need more than that. 1424 00:52:39,920 --> 00:52:42,199 Another question. 1425 00:52:42,200 --> 00:52:44,359 Did you investigate on the effect 1426 00:52:44,360 --> 00:52:46,519 on having German IP 1427 00:52:46,520 --> 00:52:48,589 address or the 1428 00:52:48,590 --> 00:52:51,559 browser language in German? 1429 00:52:51,560 --> 00:52:52,560 Mm hmm. 1430 00:52:53,630 --> 00:52:55,789 I had had a university 1431 00:52:55,790 --> 00:52:57,139 IP address. 1432 00:52:57,140 --> 00:52:59,389 The browser was actually set to English. 1433 00:52:59,390 --> 00:53:00,979 My whole system was actually English at 1434 00:53:00,980 --> 00:53:01,980 the time. 1435 00:53:02,660 --> 00:53:04,699 I'm acknowledging it again. 1436 00:53:04,700 --> 00:53:07,039 And in a way, this is like this puzzle 1437 00:53:07,040 --> 00:53:09,559 piece which has this that this different 1438 00:53:09,560 --> 00:53:11,389 these different settings, but we don't 1439 00:53:11,390 --> 00:53:13,039 know what difference that would make, 1440 00:53:13,040 --> 00:53:14,719 right? Do we get different 1441 00:53:14,720 --> 00:53:16,459 recommendations? 1442 00:53:16,460 --> 00:53:18,349 And it's about it. 1443 00:53:21,110 --> 00:53:22,110 OK. 1444 00:53:23,050 --> 00:53:25,119 Then thank you 1445 00:53:25,120 --> 00:53:27,069 very much for your awesome talk. 1446 00:53:27,070 --> 00:53:28,070 And here. 1447 00:53:29,150 --> 00:53:30,150 Thank you. 1448 00:53:30,740 --> 00:53:33,169 Yeah, and tough, beautiful 1449 00:53:33,170 --> 00:53:34,609 as we speak. 1450 00:53:34,610 --> 00:53:36,259 Same to you and happy hacking to 1451 00:53:36,260 --> 00:53:38,359 everybody, and let's hope we 1452 00:53:38,360 --> 00:53:39,979 can do a bit surveillance of surveillance 1453 00:53:39,980 --> 00:53:40,980 capitalism. 1454 00:53:44,240 --> 00:53:45,689 OK, bye bye. 1455 00:53:45,690 --> 00:53:46,690 Bye.