0 00:00:00,000 --> 00:00:30,000 Dear viewer, these subtitles were generated by a machine via the service Trint and therefore are (very) buggy. If you are capable, please help us to create good quality subtitles: https://c3subtitles.de/talk/32 Thanks! 1 00:00:12,580 --> 00:00:14,949 OK, so as announced, 2 00:00:14,950 --> 00:00:16,899 I'm talking about genocide. 3 00:00:16,900 --> 00:00:18,999 I've been looking into 4 00:00:19,000 --> 00:00:21,009 security vulnerabilities for quite some 5 00:00:21,010 --> 00:00:23,379 time now. I started exploiting them and 6 00:00:23,380 --> 00:00:25,509 then and in the late 90s, it got kind 7 00:00:25,510 --> 00:00:27,969 of boring. So I started to look into 8 00:00:27,970 --> 00:00:30,249 how to prevent people from exploiting 9 00:00:30,250 --> 00:00:31,509 your computer. 10 00:00:31,510 --> 00:00:33,879 And that turned 11 00:00:33,880 --> 00:00:36,159 out to be actually very hard in practice. 12 00:00:36,160 --> 00:00:38,769 We still have a situation where everybody 13 00:00:38,770 --> 00:00:39,770 hates everybody. 14 00:00:41,020 --> 00:00:42,609 The NSA hates the Chinese, the Chinese, 15 00:00:42,610 --> 00:00:44,679 the NSA, the NSA hacks Israel and 16 00:00:44,680 --> 00:00:46,359 the Israelis, the Russians. 17 00:00:46,360 --> 00:00:48,459 And of course, some of our 18 00:00:48,460 --> 00:00:50,349 people hack other people, too. 19 00:00:50,350 --> 00:00:52,419 But nobody really understands the art 20 00:00:52,420 --> 00:00:54,579 of defense, 21 00:00:54,580 --> 00:00:56,769 not letting people into 22 00:00:56,770 --> 00:00:57,939 your computers. 23 00:00:57,940 --> 00:01:00,809 And I'm 24 00:01:00,810 --> 00:01:03,249 making a living on security problems and 25 00:01:03,250 --> 00:01:05,649 on consulting on how to prevent security 26 00:01:05,650 --> 00:01:08,109 issues. And that's 27 00:01:08,110 --> 00:01:09,640 a common pattern that 28 00:01:10,810 --> 00:01:13,449 you see there, kinds of bugs 29 00:01:13,450 --> 00:01:15,799 that are very, very hard to get right. 30 00:01:15,800 --> 00:01:17,409 Like you have to work for your source 31 00:01:17,410 --> 00:01:19,449 code and look at every single line. 32 00:01:19,450 --> 00:01:21,219 Every single line has to fulfill a 33 00:01:21,220 --> 00:01:23,409 certain property for your software 34 00:01:23,410 --> 00:01:25,269 to be 35 00:01:27,130 --> 00:01:28,689 good and defensible against certain 36 00:01:28,690 --> 00:01:29,690 classes of attacks. 37 00:01:31,480 --> 00:01:33,849 Interestingly, you'll 38 00:01:33,850 --> 00:01:36,369 find that for certain classes 39 00:01:36,370 --> 00:01:37,929 of attacks, for certain classes of 40 00:01:37,930 --> 00:01:40,119 vulnerabilities, you can find 41 00:01:40,120 --> 00:01:41,199 generic solutions. 42 00:01:41,200 --> 00:01:43,599 For instance, for a scale injections, 43 00:01:43,600 --> 00:01:45,669 you use prepared statements, do 44 00:01:45,670 --> 00:01:47,889 a court audit. I'm groping for all the 45 00:01:47,890 --> 00:01:49,749 documents. I'm seeing everything. 46 00:01:49,750 --> 00:01:50,919 A prepared statement. 47 00:01:50,920 --> 00:01:52,539 I don't have to look for any Ezekial 48 00:01:52,540 --> 00:01:53,889 injections anymore. 49 00:01:53,890 --> 00:01:56,289 If there's not if somebody is assembling 50 00:01:56,290 --> 00:01:58,659 in a statement by hand, 51 00:01:58,660 --> 00:01:59,889 that's a problem. 52 00:01:59,890 --> 00:02:02,589 The same can be said for 53 00:02:02,590 --> 00:02:04,959 quite a number of other classes 54 00:02:04,960 --> 00:02:06,369 of vulnerabilities. 55 00:02:06,370 --> 00:02:09,279 And the one I want to talk about 56 00:02:09,280 --> 00:02:11,559 today are buffer overflow. 57 00:02:11,560 --> 00:02:13,779 But I assume you to be 58 00:02:13,780 --> 00:02:15,819 quite familiar. So I'm making a 59 00:02:17,650 --> 00:02:18,909 virtual process of it. 60 00:02:18,910 --> 00:02:20,049 That's essentially the problem. 61 00:02:20,050 --> 00:02:22,389 You see, you have some scenario 62 00:02:22,390 --> 00:02:24,609 called some and you 63 00:02:24,610 --> 00:02:26,679 write into that area and the input 64 00:02:26,680 --> 00:02:28,989 data you get happens to be longer 65 00:02:28,990 --> 00:02:31,209 than the 16 characters you see there. 66 00:02:31,210 --> 00:02:33,400 And boom, you're overwriting some memory 67 00:02:34,510 --> 00:02:36,879 and not if you've been paying 68 00:02:36,880 --> 00:02:39,009 attention for the last 69 00:02:39,010 --> 00:02:41,289 20 or so Congresses and other hacker 70 00:02:41,290 --> 00:02:43,449 events. That goes a lot of 71 00:02:43,450 --> 00:02:45,759 effort into how to actually 72 00:02:45,760 --> 00:02:47,859 exploit things, because as 73 00:02:47,860 --> 00:02:49,719 soon as you write outside that buffer, 74 00:02:49,720 --> 00:02:51,879 which is called sum and that case, 75 00:02:53,110 --> 00:02:55,599 you tend to override some other data. 76 00:02:55,600 --> 00:02:57,549 And from there you can get into the 77 00:02:57,550 --> 00:02:59,649 computer by, for instance, or writing 78 00:02:59,650 --> 00:03:01,989 the return address of a function, very 79 00:03:01,990 --> 00:03:03,459 classical stack overflow like in the 80 00:03:03,460 --> 00:03:06,399 classical 11 paper. 81 00:03:06,400 --> 00:03:08,589 Or you could overwrite some structure on 82 00:03:08,590 --> 00:03:10,779 the heap, like the linked list that's 83 00:03:10,780 --> 00:03:12,789 responsible for memory allocation that 84 00:03:12,790 --> 00:03:13,790 gets 85 00:03:15,130 --> 00:03:17,229 the the locator reads a pointer and then 86 00:03:17,230 --> 00:03:19,489 writes a pointer and then writes 87 00:03:19,490 --> 00:03:21,529 back. You could write into the structured 88 00:03:21,530 --> 00:03:22,959 exception handler on Windows. 89 00:03:22,960 --> 00:03:25,119 You can write into quite 90 00:03:25,120 --> 00:03:27,219 a number of arbitrary data structures. 91 00:03:27,220 --> 00:03:28,599 Such a long jump of us. 92 00:03:28,600 --> 00:03:30,399 A lot of the things that are under the 93 00:03:30,400 --> 00:03:31,900 hood of the C implementation 94 00:03:33,640 --> 00:03:36,399 are potentially dangerous, potentially 95 00:03:36,400 --> 00:03:38,379 breakable just by writing off your 96 00:03:38,380 --> 00:03:39,429 buffer. 97 00:03:39,430 --> 00:03:41,680 And so far, 98 00:03:43,280 --> 00:03:45,159 well, let's get into more detail. 99 00:03:45,160 --> 00:03:47,319 So that kind of problem you see here 100 00:03:47,320 --> 00:03:49,809 in the C-code, who understands 101 00:03:49,810 --> 00:03:50,709 what the problem is here? 102 00:03:50,710 --> 00:03:51,710 Raise your hand, please. 103 00:03:53,680 --> 00:03:54,680 Good. 104 00:03:55,900 --> 00:03:57,639 I'm having a test on that later to see 105 00:03:57,640 --> 00:03:59,349 how much how many of you paid attention. 106 00:04:00,910 --> 00:04:03,429 So and that's something 107 00:04:04,600 --> 00:04:06,909 we call the buffer overflow or 108 00:04:06,910 --> 00:04:09,069 overflow and have certain 109 00:04:09,070 --> 00:04:10,689 kinds of names for exploits. 110 00:04:10,690 --> 00:04:12,849 And but it turned 111 00:04:12,850 --> 00:04:14,620 out and that's what I did, that 112 00:04:15,910 --> 00:04:17,409 science, actually computer science. 113 00:04:17,410 --> 00:04:19,689 You know, the people at university 114 00:04:19,690 --> 00:04:21,819 tend to look at the same problems that 115 00:04:21,820 --> 00:04:23,979 the hackers do and 116 00:04:23,980 --> 00:04:26,049 sometimes have their own terminology, but 117 00:04:26,050 --> 00:04:27,489 they sometimes also come up with 118 00:04:27,490 --> 00:04:28,479 interesting solutions. 119 00:04:28,480 --> 00:04:29,859 And one of those interesting solutions 120 00:04:29,860 --> 00:04:31,539 I'm going to show to you today. 121 00:04:31,540 --> 00:04:33,430 So to get used to the lingo, 122 00:04:34,570 --> 00:04:36,789 what we're seeing here is so-called 123 00:04:36,790 --> 00:04:39,189 a spatial memory safety problem, 124 00:04:39,190 --> 00:04:41,439 spatial because you're 125 00:04:41,440 --> 00:04:44,229 in the wrong part of the space. 126 00:04:44,230 --> 00:04:46,029 You write something you're not supposed 127 00:04:46,030 --> 00:04:48,129 to write to spatial 128 00:04:48,130 --> 00:04:49,130 memory safety. 129 00:04:50,200 --> 00:04:52,269 And that's also temporal memory 130 00:04:52,270 --> 00:04:54,429 safety and 131 00:04:54,430 --> 00:04:56,649 that's that. And essentially, 132 00:04:56,650 --> 00:04:58,749 you access memory that has been freed and 133 00:04:58,750 --> 00:05:00,159 then you freed again, et cetera, et 134 00:05:00,160 --> 00:05:02,259 cetera. Or you look at another object 135 00:05:02,260 --> 00:05:03,219 and write into that. 136 00:05:03,220 --> 00:05:05,739 Essentially what happens is that whatever 137 00:05:05,740 --> 00:05:07,929 you write to is no longer what you expect 138 00:05:07,930 --> 00:05:09,729 it to be. So usually in the case of an 139 00:05:09,730 --> 00:05:11,389 educator, if you're. 140 00:05:11,390 --> 00:05:13,439 All three, then, instead of the object, 141 00:05:13,440 --> 00:05:15,319 you've got the set of pointers that 142 00:05:15,320 --> 00:05:17,329 linked the free memory buffers, so 143 00:05:17,330 --> 00:05:19,369 freeing them again or marking something 144 00:05:19,370 --> 00:05:21,259 that chase those pointers and start 145 00:05:21,260 --> 00:05:23,239 writing things until you address space, 146 00:05:23,240 --> 00:05:25,939 and that usually leads to all sorts of 147 00:05:25,940 --> 00:05:28,129 unpleasant situations of overwriting 148 00:05:28,130 --> 00:05:30,559 the instruction pointer and 149 00:05:30,560 --> 00:05:32,299 giving you code execution one way or 150 00:05:32,300 --> 00:05:33,300 another. 151 00:05:34,070 --> 00:05:36,139 So that's 152 00:05:36,140 --> 00:05:38,569 called temporal memory safety 153 00:05:38,570 --> 00:05:40,579 and the lingo of scientists. 154 00:05:40,580 --> 00:05:42,649 And the reason I'm holding that 155 00:05:42,650 --> 00:05:43,670 talk is that 156 00:05:44,840 --> 00:05:46,999 scientists now have found 157 00:05:47,000 --> 00:05:49,129 a solution to the problem 158 00:05:49,130 --> 00:05:51,349 that looks like it's viable to use for 159 00:05:51,350 --> 00:05:52,729 real world code. 160 00:05:52,730 --> 00:05:55,399 So, as I said, I've 161 00:05:55,400 --> 00:05:56,839 worried about that problem for quite a 162 00:05:56,840 --> 00:05:57,840 long time. 163 00:05:58,880 --> 00:06:00,800 There are a number of existing approaches 164 00:06:02,150 --> 00:06:04,099 towards the problem. 165 00:06:04,100 --> 00:06:06,439 One that I remember 166 00:06:06,440 --> 00:06:08,269 my talks, if you remember what I'm doing, 167 00:06:08,270 --> 00:06:10,519 looked very much into as the first point 168 00:06:10,520 --> 00:06:11,659 uses safe language. 169 00:06:11,660 --> 00:06:13,999 Don't write and see if 170 00:06:14,000 --> 00:06:15,000 you. 171 00:06:19,840 --> 00:06:22,029 Unless you have a very good excuse me 172 00:06:22,030 --> 00:06:24,219 is probably not the right language, 173 00:06:24,220 --> 00:06:26,409 and unless you're one of, you know, three 174 00:06:26,410 --> 00:06:27,410 or four gurus, 175 00:06:29,110 --> 00:06:31,299 one of the room to room seven just 176 00:06:31,300 --> 00:06:31,889 got a baby. 177 00:06:31,890 --> 00:06:33,729 So not many of you, I trust writing 178 00:06:33,730 --> 00:06:35,289 scored perfectly incorrectly. 179 00:06:35,290 --> 00:06:37,449 Sorry about that. No offense intended, 180 00:06:37,450 --> 00:06:39,519 but that's just the way it is. 181 00:06:39,520 --> 00:06:41,619 And most people 182 00:06:41,620 --> 00:06:42,819 don't get it right. 183 00:06:42,820 --> 00:06:44,439 And then there's the whole topic of 184 00:06:44,440 --> 00:06:46,209 mitigations. Also something that has been 185 00:06:46,210 --> 00:06:47,649 discussed on conferences for quite a 186 00:06:47,650 --> 00:06:49,719 while. Use aerospace layout 187 00:06:49,720 --> 00:06:51,789 randomization, use data, execution 188 00:06:51,790 --> 00:06:54,009 prevention, use dictionaries. 189 00:06:54,010 --> 00:06:56,949 And of course, 190 00:06:56,950 --> 00:06:59,109 there are ways around that as well. 191 00:06:59,110 --> 00:07:00,909 So in case of aerospace layered 192 00:07:00,910 --> 00:07:02,979 randomization, the idea is 193 00:07:02,980 --> 00:07:05,039 you might control the instruction point 194 00:07:05,040 --> 00:07:06,459 by your buffer overflow, but you no 195 00:07:06,460 --> 00:07:08,529 longer know where to jump to execute 196 00:07:08,530 --> 00:07:11,319 code and 197 00:07:11,320 --> 00:07:14,199 modern exploits use techniques 198 00:07:14,200 --> 00:07:16,599 to circumvent that, for instance, 199 00:07:16,600 --> 00:07:19,279 and finding another vulnerability. 200 00:07:19,280 --> 00:07:21,369 So in the old days, you had 201 00:07:21,370 --> 00:07:23,529 to find an overflow at a rate 202 00:07:23,530 --> 00:07:25,659 overrun in memory space. 203 00:07:25,660 --> 00:07:27,849 These days you usually have to find a 204 00:07:27,850 --> 00:07:30,099 vulnerability that gives you access 205 00:07:30,100 --> 00:07:32,169 into the space of the program you want to 206 00:07:32,170 --> 00:07:34,389 attack, because as soon as you can start 207 00:07:34,390 --> 00:07:36,459 reading out parts of memory, you 208 00:07:36,460 --> 00:07:37,989 can figure out where things are. 209 00:07:37,990 --> 00:07:40,209 So you can circumvent Islam 210 00:07:40,210 --> 00:07:43,269 by reading out the right address. 211 00:07:43,270 --> 00:07:45,669 Or you can do what many people do. 212 00:07:45,670 --> 00:07:46,719 You see praying. 213 00:07:46,720 --> 00:07:49,209 It's very popular. If you take a browser, 214 00:07:49,210 --> 00:07:50,799 you just have a piece of JavaScript that 215 00:07:50,800 --> 00:07:52,569 generates a gazillion copies of it 216 00:07:52,570 --> 00:07:54,639 explored called all over the space. 217 00:07:54,640 --> 00:07:56,439 And essentially you no longer care where 218 00:07:56,440 --> 00:07:58,749 you jump to because you exploit 219 00:07:58,750 --> 00:08:00,430 and there's a very big likelihood, 220 00:08:01,480 --> 00:08:03,519 deep data, execution, prevention. 221 00:08:03,520 --> 00:08:05,769 Also, in the old days, if you wrote 222 00:08:05,770 --> 00:08:08,169 an exploit, you would have that section 223 00:08:08,170 --> 00:08:10,299 of memory on the stack and you would 224 00:08:10,300 --> 00:08:12,429 just put the code you execute right 225 00:08:12,430 --> 00:08:14,169 until the very same buffer that you 226 00:08:14,170 --> 00:08:15,159 overflow. 227 00:08:15,160 --> 00:08:17,649 And you would then then controlling 228 00:08:17,650 --> 00:08:19,659 the EOP, the instruction point and the 229 00:08:19,660 --> 00:08:21,939 stick frame, you would point 230 00:08:21,940 --> 00:08:23,589 it back to the beginning of a buffer. 231 00:08:23,590 --> 00:08:25,179 And in that buffer there, your 232 00:08:25,180 --> 00:08:27,339 instructions and you just 233 00:08:27,340 --> 00:08:29,469 execute them and saw 234 00:08:29,470 --> 00:08:31,539 some smart people thought, well, we could 235 00:08:31,540 --> 00:08:33,459 make the stick non executable. 236 00:08:33,460 --> 00:08:34,690 That will certainly help 237 00:08:36,520 --> 00:08:39,219 until at that very conference here, 238 00:08:39,220 --> 00:08:41,649 people started playing around 239 00:08:41,650 --> 00:08:43,928 with what later became 240 00:08:43,929 --> 00:08:46,629 known as return oriented programing. 241 00:08:46,630 --> 00:08:48,979 So instead of jumping to your own buffer 242 00:08:48,980 --> 00:08:51,689 of code, what you have in 243 00:08:51,690 --> 00:08:54,279 your buffer is just 244 00:08:54,280 --> 00:08:56,859 a lot of stack frames 245 00:08:56,860 --> 00:08:59,259 that actually jump to preexisting code 246 00:08:59,260 --> 00:09:00,279 in the executables. 247 00:09:00,280 --> 00:09:02,169 Instead of jumping to your code, you're 248 00:09:02,170 --> 00:09:04,839 just looking for a couple of bytes 249 00:09:04,840 --> 00:09:06,579 in the program you are checking that are 250 00:09:06,580 --> 00:09:07,959 executable already. 251 00:09:07,960 --> 00:09:10,449 You jump there and executes code and 252 00:09:10,450 --> 00:09:12,789 you're done. You can change that by 253 00:09:12,790 --> 00:09:15,099 changing Steck frames in the buffer 254 00:09:15,100 --> 00:09:16,809 you write so that execution prevention 255 00:09:16,810 --> 00:09:18,729 can be worked around. 256 00:09:18,730 --> 00:09:21,369 Set cannery's people thought, 257 00:09:21,370 --> 00:09:24,429 you know, I have that buffer 258 00:09:24,430 --> 00:09:26,589 and next to the buffer is my 259 00:09:26,590 --> 00:09:28,569 stick frame. The instruction point to 260 00:09:28,570 --> 00:09:30,729 address that I want to control while we 261 00:09:30,730 --> 00:09:32,859 put a magic value in here in between 262 00:09:32,860 --> 00:09:34,719 the buffer and my stick frame. 263 00:09:34,720 --> 00:09:37,209 And if somebody overrides 264 00:09:37,210 --> 00:09:39,459 the buffer, we'll check whether a certain 265 00:09:39,460 --> 00:09:41,829 magic value is still at the right place 266 00:09:41,830 --> 00:09:42,789 there. 267 00:09:42,790 --> 00:09:44,859 And then if it isn't, we say 268 00:09:44,860 --> 00:09:46,629 all we have been attacked. 269 00:09:46,630 --> 00:09:48,729 And fortunately, that's also not 100 270 00:09:48,730 --> 00:09:50,829 percent secure and two 271 00:09:50,830 --> 00:09:51,789 ways to get around it. 272 00:09:51,790 --> 00:09:53,599 First one already mentioned, if you get 273 00:09:53,600 --> 00:09:56,319 rid access into the space, 274 00:09:56,320 --> 00:09:58,179 you can find out what the value of the 275 00:09:58,180 --> 00:10:00,309 stack can is and use 276 00:10:00,310 --> 00:10:01,310 it. 277 00:10:01,750 --> 00:10:03,649 There was a very nice example that it 278 00:10:03,650 --> 00:10:05,769 shot on a Cisco where you 279 00:10:05,770 --> 00:10:07,929 would send an ICMP 280 00:10:07,930 --> 00:10:10,329 echo request to a box, 281 00:10:10,330 --> 00:10:12,039 which was 20 bytes long. 282 00:10:12,040 --> 00:10:13,989 But in the header it says I'm one 283 00:10:13,990 --> 00:10:16,269 thousand five hundred bytes long and 284 00:10:16,270 --> 00:10:18,519 Cisco takes the packet, receives it, sees 285 00:10:18,520 --> 00:10:19,779 in the header one thousand five hundred 286 00:10:19,780 --> 00:10:20,169 bytes. 287 00:10:20,170 --> 00:10:22,509 I'm going to send back 1500 288 00:10:22,510 --> 00:10:23,799 bytes of my memory boom. 289 00:10:23,800 --> 00:10:24,800 There you go. 290 00:10:28,190 --> 00:10:30,309 Was so secondaries then nice 291 00:10:30,310 --> 00:10:32,379 and everything you can do what 292 00:10:32,380 --> 00:10:34,569 Linux did and fuck it up 293 00:10:34,570 --> 00:10:36,639 so far, something like the last 294 00:10:36,640 --> 00:10:38,439 six or seven years or six half a year 295 00:10:38,440 --> 00:10:40,599 ago, you statically 296 00:10:40,600 --> 00:10:41,769 linked an executable 297 00:10:42,910 --> 00:10:45,159 Linux would choose the magic value of 298 00:10:45,160 --> 00:10:47,269 zero for the dictionary. 299 00:10:47,270 --> 00:10:48,669 Lots of things that can go wrong there. 300 00:10:50,710 --> 00:10:51,710 So. 301 00:10:53,050 --> 00:10:54,050 Well, 302 00:10:57,880 --> 00:10:59,979 FIFA complains that it's just jealousy 303 00:10:59,980 --> 00:11:01,959 on Linux that does it wrong, Ditlev gets 304 00:11:01,960 --> 00:11:02,960 it right. 305 00:11:09,660 --> 00:11:11,849 So a lot of effort has 306 00:11:11,850 --> 00:11:14,009 been put in to at least give you some 307 00:11:14,010 --> 00:11:16,109 decent debugging tools in order to 308 00:11:16,110 --> 00:11:18,299 detect memory overruns during production. 309 00:11:19,860 --> 00:11:22,229 Most of you probably have known of a 310 00:11:22,230 --> 00:11:23,219 third of ground. 311 00:11:23,220 --> 00:11:24,809 It's one of the more popular tools. 312 00:11:24,810 --> 00:11:26,849 Essentially, you link a Dybbuk version of 313 00:11:26,850 --> 00:11:29,069 your program with the ground and then 314 00:11:29,070 --> 00:11:30,089 run all of your tests. 315 00:11:30,090 --> 00:11:31,469 And Verbruggen does a lot of things 316 00:11:31,470 --> 00:11:33,089 behind the scenes to find out whether you 317 00:11:33,090 --> 00:11:34,679 write to memory. You're not supposed to 318 00:11:34,680 --> 00:11:35,680 write to or not. 319 00:11:37,110 --> 00:11:39,269 There have been other ones, the most 320 00:11:39,270 --> 00:11:41,369 recent versions of Decency and 321 00:11:41,370 --> 00:11:43,469 LVM ship with something 322 00:11:43,470 --> 00:11:45,389 that's called a memory sanitizer 323 00:11:47,310 --> 00:11:49,379 that tries 324 00:11:49,380 --> 00:11:50,380 to do the same thing. 325 00:11:51,930 --> 00:11:54,389 That's the safe called project security, 326 00:11:54,390 --> 00:11:56,219 safety and et cetera, et cetera, et 327 00:11:56,220 --> 00:11:58,619 cetera. Some of them are, you know, after 328 00:11:58,620 --> 00:11:59,699 the fact debugging tools. 329 00:11:59,700 --> 00:12:01,379 Some of them actually hook up into your 330 00:12:01,380 --> 00:12:03,689 compiler to change things, to 331 00:12:03,690 --> 00:12:04,829 detect memory flaws. 332 00:12:06,000 --> 00:12:07,000 There are. 333 00:12:07,620 --> 00:12:10,259 And those approaches to 334 00:12:10,260 --> 00:12:12,449 principal ideas, 335 00:12:12,450 --> 00:12:14,939 how you detect a buffer overflow, 336 00:12:14,940 --> 00:12:17,129 how to detect a an invalid memory 337 00:12:17,130 --> 00:12:18,130 access. 338 00:12:18,750 --> 00:12:20,879 And the first one is 339 00:12:20,880 --> 00:12:23,459 the so-called object based approach. 340 00:12:23,460 --> 00:12:25,649 So just for reference, the 341 00:12:25,650 --> 00:12:27,719 notation I'm using here is black 342 00:12:27,720 --> 00:12:29,969 as the code that the user actually 343 00:12:29,970 --> 00:12:32,579 wrote and read is 344 00:12:32,580 --> 00:12:35,279 the code that was inserted by 345 00:12:35,280 --> 00:12:37,289 the tool you're looking at to find your 346 00:12:37,290 --> 00:12:38,759 buffer overflow problems. 347 00:12:38,760 --> 00:12:40,799 So read as tool injected black as what 348 00:12:40,800 --> 00:12:42,989 the programmer. 349 00:12:42,990 --> 00:12:45,299 So the general idea 350 00:12:45,300 --> 00:12:47,519 is, and the object based approach 351 00:12:47,520 --> 00:12:49,679 is that for every object and memory, 352 00:12:49,680 --> 00:12:51,419 you know, whether that's a valid object 353 00:12:51,420 --> 00:12:53,579 or not, which translates into 354 00:12:53,580 --> 00:12:55,499 for every address in your address space, 355 00:12:55,500 --> 00:12:57,569 you know, whether there's a valid object 356 00:12:57,570 --> 00:12:58,570 there or not. 357 00:12:59,460 --> 00:13:02,039 And all you do is look up whether 358 00:13:02,040 --> 00:13:03,779 the address is good or not. 359 00:13:03,780 --> 00:13:05,099 So if you look, for instance, at 360 00:13:05,100 --> 00:13:07,139 foreground, the background memory 361 00:13:07,140 --> 00:13:09,199 profiler, what it does is 362 00:13:09,200 --> 00:13:11,489 it keeps a so-called shadow memory. 363 00:13:11,490 --> 00:13:12,629 So for every 364 00:13:13,680 --> 00:13:15,779 word in memory, 365 00:13:15,780 --> 00:13:17,489 they have a data structure somewhere that 366 00:13:17,490 --> 00:13:18,839 store a couple of bits. 367 00:13:18,840 --> 00:13:21,029 And those bits are that's 368 00:13:21,030 --> 00:13:23,099 a correctly allocated address 369 00:13:23,100 --> 00:13:24,899 and it's not even been freed yet. 370 00:13:24,900 --> 00:13:27,299 So that track, whether that's memory 371 00:13:27,300 --> 00:13:29,399 that is user visible 372 00:13:29,400 --> 00:13:31,589 as opposed to memory, that shouldn't 373 00:13:31,590 --> 00:13:33,749 be used visible like your stick frame or 374 00:13:33,750 --> 00:13:36,200 your long term buffer's or whatever. 375 00:13:37,230 --> 00:13:39,299 And so they 376 00:13:39,300 --> 00:13:42,179 check that and then return, 377 00:13:42,180 --> 00:13:43,239 whether that's correct or not. 378 00:13:43,240 --> 00:13:45,419 So if the byte is broken because 379 00:13:45,420 --> 00:13:47,099 it's not user located memory that you're 380 00:13:47,100 --> 00:13:49,259 looking at, that's bad 381 00:13:49,260 --> 00:13:50,280 and the exception is raised. 382 00:13:52,740 --> 00:13:55,049 That's all nice and everything until 383 00:13:55,050 --> 00:13:56,820 you start looking at examples like that. 384 00:13:57,870 --> 00:13:59,999 Because imagine I would 385 00:14:00,000 --> 00:14:02,339 allocate a structure like that, 386 00:14:02,340 --> 00:14:04,439 then my shadow memory 387 00:14:04,440 --> 00:14:06,689 would say, yeah, perfectly 388 00:14:06,690 --> 00:14:09,479 fine, everything's allocated memory 389 00:14:09,480 --> 00:14:11,579 and I'm pointing inside 390 00:14:11,580 --> 00:14:13,709 the structure here and I'm 391 00:14:13,710 --> 00:14:15,989 not accessing memory that has been freed 392 00:14:15,990 --> 00:14:17,069 in between. 393 00:14:17,070 --> 00:14:19,230 But still, if I overflow the ID 394 00:14:20,580 --> 00:14:22,979 field, I run into the account balance 395 00:14:22,980 --> 00:14:24,059 field here. 396 00:14:24,060 --> 00:14:26,939 And that's something that in general 397 00:14:26,940 --> 00:14:30,059 object based tools do not detect. 398 00:14:30,060 --> 00:14:31,949 There are other flaws like that, 399 00:14:31,950 --> 00:14:33,629 comparable flaws that it wouldn't detect. 400 00:14:33,630 --> 00:14:36,089 Like, for instance, you 401 00:14:36,090 --> 00:14:38,909 you free a piece of memory, 402 00:14:38,910 --> 00:14:41,219 you allocate a different structure, 403 00:14:41,220 --> 00:14:43,889 it gets the same address and 404 00:14:43,890 --> 00:14:46,139 suddenly that's valid memory again and 405 00:14:46,140 --> 00:14:48,089 start writing into memory at an address 406 00:14:48,090 --> 00:14:50,249 that's considered valid, even though the 407 00:14:50,250 --> 00:14:52,199 actual type of the structure, that place 408 00:14:52,200 --> 00:14:53,369 has changed. 409 00:14:53,370 --> 00:14:55,529 So that range has a flag for that. 410 00:14:55,530 --> 00:14:58,109 So it will not reuse addresses. 411 00:14:58,110 --> 00:14:59,790 But in general, 412 00:15:00,930 --> 00:15:03,149 that kind of problem make 413 00:15:03,150 --> 00:15:05,489 100 percent detection 414 00:15:05,490 --> 00:15:08,349 of buffer of laws using 415 00:15:08,350 --> 00:15:09,970 the object best approach and possible. 416 00:15:11,070 --> 00:15:14,249 So there's an alternative 417 00:15:14,250 --> 00:15:16,319 approach and stuff like Secure 418 00:15:16,320 --> 00:15:17,320 doesn't, for instance. 419 00:15:18,870 --> 00:15:21,029 And that is instead of representing 420 00:15:21,030 --> 00:15:23,819 a pointer just as overt, 421 00:15:23,820 --> 00:15:26,429 you represent the pointer as Everet. 422 00:15:26,430 --> 00:15:28,919 Plus you remember the base, 423 00:15:28,920 --> 00:15:30,339 plus you remember the bond. 424 00:15:30,340 --> 00:15:32,399 So the base address of the 425 00:15:32,400 --> 00:15:34,079 allocated object and the bond of the 426 00:15:34,080 --> 00:15:35,080 allocated object. 427 00:15:36,030 --> 00:15:38,159 Why you have to do that is it has to do 428 00:15:38,160 --> 00:15:40,169 with the C Steinert because you can do 429 00:15:40,170 --> 00:15:42,869 point at arithmetic and C 430 00:15:42,870 --> 00:15:44,879 so you can take a pointer to the 431 00:15:44,880 --> 00:15:47,639 beginning of an array and add 20 and 432 00:15:47,640 --> 00:15:49,799 suddenly you have a pointer inside that 433 00:15:49,800 --> 00:15:51,959 character array and keep 434 00:15:51,960 --> 00:15:52,919 on calculating with that. 435 00:15:52,920 --> 00:15:55,469 It's even legal to add one hundred 436 00:15:55,470 --> 00:15:57,839 points outside and then subtract 437 00:15:57,840 --> 00:16:00,599 one hundred again and then access 438 00:16:00,600 --> 00:16:02,159 your memory because 439 00:16:04,020 --> 00:16:05,909 you might have pointer values that are 440 00:16:05,910 --> 00:16:07,299 pointing outside the value. 441 00:16:07,300 --> 00:16:09,429 Range and intermediate computations 442 00:16:09,430 --> 00:16:10,989 and still have it available, see problem 443 00:16:10,990 --> 00:16:13,089 and see problems, unfortunately, actually 444 00:16:13,090 --> 00:16:14,739 do that there. 445 00:16:14,740 --> 00:16:17,469 So Fairpoint approach 446 00:16:17,470 --> 00:16:19,509 usually involves changing the compiler 447 00:16:20,590 --> 00:16:22,989 and it has the very negative effect 448 00:16:22,990 --> 00:16:25,209 of changing the size of your pointer. 449 00:16:25,210 --> 00:16:26,829 So your point suddenly has all the 450 00:16:26,830 --> 00:16:28,849 information it needs to find out whether 451 00:16:28,850 --> 00:16:30,969 a memory exists especially valid 452 00:16:30,970 --> 00:16:33,699 if you use the base and won't address 453 00:16:33,700 --> 00:16:35,349 the getting to temporal later. 454 00:16:35,350 --> 00:16:36,880 Just looking at spatial for the moment, 455 00:16:38,720 --> 00:16:40,869 unfortunately, it also means that the 456 00:16:40,870 --> 00:16:42,969 layout of your structures change, 457 00:16:42,970 --> 00:16:44,499 so it breaks compatibility with the 458 00:16:44,500 --> 00:16:46,149 existing C program. 459 00:16:46,150 --> 00:16:47,769 So if you look at the existing solutions 460 00:16:47,770 --> 00:16:49,839 that use Fairpoint us, they usually 461 00:16:49,840 --> 00:16:51,669 suffer from a number of problems. 462 00:16:53,110 --> 00:16:56,079 One is incompatibility by breaking 463 00:16:56,080 --> 00:16:57,759 your structure, offsets and everything. 464 00:16:57,760 --> 00:16:59,829 So memory level changes, 465 00:16:59,830 --> 00:17:01,719 calling conditions change so you cannot 466 00:17:01,720 --> 00:17:03,969 link against an 467 00:17:03,970 --> 00:17:05,439 instrumented library as it cannot do a 468 00:17:05,440 --> 00:17:06,789 separate compilation that usually 469 00:17:06,790 --> 00:17:09,108 involves things like a whole program 470 00:17:09,109 --> 00:17:11,588 analysis path where instead of compiling 471 00:17:11,589 --> 00:17:13,539 a list of object files, you take all your 472 00:17:13,540 --> 00:17:15,459 source code or your millions of lines of 473 00:17:15,460 --> 00:17:18,098 C and compile them all at the same time. 474 00:17:18,099 --> 00:17:19,659 You can do interesting things then, but 475 00:17:19,660 --> 00:17:22,029 it turns out to be not very practical 476 00:17:22,030 --> 00:17:24,309 if you actually tried to compile real 477 00:17:24,310 --> 00:17:25,838 world software with it. 478 00:17:25,839 --> 00:17:28,059 So both approaches have 479 00:17:28,060 --> 00:17:30,129 their their drawbacks and 480 00:17:30,130 --> 00:17:31,130 advantages. 481 00:17:31,990 --> 00:17:34,179 So enter soft 482 00:17:34,180 --> 00:17:35,180 seats. 483 00:17:35,860 --> 00:17:38,109 So one noticeable thing about 484 00:17:38,110 --> 00:17:40,749 academia, they get naming of software 485 00:17:40,750 --> 00:17:42,420 even worse than hackers do. 486 00:17:44,170 --> 00:17:46,239 It's called soft bone RCTs, soft 487 00:17:46,240 --> 00:17:48,789 porn, because they had a heart bound 488 00:17:48,790 --> 00:17:51,339 project before where they researched 489 00:17:51,340 --> 00:17:53,529 hardware support for checking 490 00:17:53,530 --> 00:17:55,659 every point of access for 491 00:17:55,660 --> 00:17:57,939 bonce violations and softball and forced 492 00:17:57,940 --> 00:17:59,709 the implementation of the same or Wilmont 493 00:17:59,710 --> 00:18:02,109 and software and RCTs 494 00:18:02,110 --> 00:18:04,629 talks about compiler enhanced temporal 495 00:18:04,630 --> 00:18:06,219 safety. 496 00:18:06,220 --> 00:18:08,799 So we got that abbreviation 497 00:18:08,800 --> 00:18:10,509 that we should research that has been 498 00:18:10,510 --> 00:18:12,190 done at University of Pennsylvania. 499 00:18:13,300 --> 00:18:15,369 And the software in part does geospatial 500 00:18:15,370 --> 00:18:17,619 safety and whether we overrun some 501 00:18:17,620 --> 00:18:19,749 piece of memory in 502 00:18:19,750 --> 00:18:21,939 the special domain wrong address 503 00:18:21,940 --> 00:18:24,009 and CTS is the part that 504 00:18:24,010 --> 00:18:26,109 does the temporal safety, meaning 505 00:18:26,110 --> 00:18:27,880 accessing memory that has been freed 506 00:18:28,930 --> 00:18:31,389 essentially or not yet allocated. 507 00:18:31,390 --> 00:18:33,699 And the thing 508 00:18:33,700 --> 00:18:36,129 that makes it interesting here is 509 00:18:36,130 --> 00:18:38,379 that they managed to get something 510 00:18:38,380 --> 00:18:40,269 that's reasonable for actual use out 511 00:18:40,270 --> 00:18:42,429 there. If you have ever used Vulgarized 512 00:18:42,430 --> 00:18:44,469 and it slows down your program by a 513 00:18:44,470 --> 00:18:46,599 factor of 20, that's nice and everything 514 00:18:46,600 --> 00:18:48,609 for testing, you can do that during 515 00:18:48,610 --> 00:18:50,049 testing and debugging. 516 00:18:50,050 --> 00:18:51,549 You wouldn't want to ship something 517 00:18:51,550 --> 00:18:52,990 linked with Wollogorang to the user 518 00:18:54,010 --> 00:18:56,349 and it 519 00:18:56,350 --> 00:18:58,719 uses disjoined Fed pointers. 520 00:18:58,720 --> 00:19:00,999 You remember object based versus pointer 521 00:19:01,000 --> 00:19:02,349 based. 522 00:19:02,350 --> 00:19:04,359 So what it does to 523 00:19:05,560 --> 00:19:07,479 remove many of the incompatibility 524 00:19:07,480 --> 00:19:09,669 problems that exist that we've had 525 00:19:09,670 --> 00:19:12,399 Ponto based approach is 526 00:19:12,400 --> 00:19:14,649 that they have shadow memory as they 527 00:19:14,650 --> 00:19:16,300 would have an object based approach, 528 00:19:17,440 --> 00:19:20,019 but they're only using that to store 529 00:19:20,020 --> 00:19:21,639 the additional pointer information. 530 00:19:21,640 --> 00:19:23,859 So the memory 531 00:19:23,860 --> 00:19:25,869 structures keep the same stick from skip 532 00:19:25,870 --> 00:19:27,939 the same. But the extra information 533 00:19:27,940 --> 00:19:30,009 you need is propagated on a 534 00:19:30,010 --> 00:19:31,179 different view. I'm getting into the 535 00:19:31,180 --> 00:19:33,309 details how they're doing that later. 536 00:19:33,310 --> 00:19:34,310 And 537 00:19:35,530 --> 00:19:37,749 they they have a proof of correctness 538 00:19:37,750 --> 00:19:38,979 of what they're doing. And that's very 539 00:19:38,980 --> 00:19:41,199 interesting. So they have a formal 540 00:19:41,200 --> 00:19:42,969 representation of the semantics of a 541 00:19:42,970 --> 00:19:45,099 subset of C, and they 542 00:19:45,100 --> 00:19:47,889 prove that that transformation, 543 00:19:47,890 --> 00:19:50,199 no invalid 544 00:19:50,200 --> 00:19:52,869 exists of memory 545 00:19:52,870 --> 00:19:54,699 can be written down without being 546 00:19:54,700 --> 00:19:57,000 detected and. 547 00:20:00,320 --> 00:20:02,239 That proofers certain problem and coming 548 00:20:02,240 --> 00:20:04,399 back to that later, but that's already 549 00:20:04,400 --> 00:20:06,629 very remarkable and 550 00:20:06,630 --> 00:20:08,839 it's implemented and that's another point 551 00:20:08,840 --> 00:20:10,669 of view. It generates very efficient 552 00:20:10,670 --> 00:20:13,189 code. It's implemented as an LVM 553 00:20:13,190 --> 00:20:15,829 optimizer. So it operates on 554 00:20:15,830 --> 00:20:17,719 LVM intermediate representation, 555 00:20:19,310 --> 00:20:21,139 which is so-called single static 556 00:20:21,140 --> 00:20:22,129 assignment. 557 00:20:22,130 --> 00:20:24,409 So no idea how many people have you 558 00:20:24,410 --> 00:20:26,779 ever looked into? Computer intermediates. 559 00:20:26,780 --> 00:20:29,269 It's sort of above 560 00:20:29,270 --> 00:20:31,669 a similar level, but below sea level. 561 00:20:31,670 --> 00:20:33,799 And you can get that essentially 562 00:20:33,800 --> 00:20:36,079 by breaking down your code and taking all 563 00:20:36,080 --> 00:20:37,519 the big expressions. 564 00:20:37,520 --> 00:20:39,529 And for every sub expression that you 565 00:20:39,530 --> 00:20:41,659 write on, if you write A plus B plus C, 566 00:20:41,660 --> 00:20:43,519 so D equals A plus B plus. 567 00:20:43,520 --> 00:20:45,829 See, you say 568 00:20:45,830 --> 00:20:48,019 my temporary T1 is A plus 569 00:20:48,020 --> 00:20:50,779 B and my temporary T2 570 00:20:50,780 --> 00:20:52,939 is T one plus C 571 00:20:52,940 --> 00:20:55,039 C break compound expressions into 572 00:20:55,040 --> 00:20:56,399 simple expressions. 573 00:20:56,400 --> 00:20:58,699 And you also do not reuse variables. 574 00:20:58,700 --> 00:21:00,739 You only use variables once you assign to 575 00:21:00,740 --> 00:21:02,060 them and then never change them. 576 00:21:03,230 --> 00:21:05,029 And the intermediate representation in 577 00:21:05,030 --> 00:21:06,379 the M is typed. 578 00:21:06,380 --> 00:21:08,659 So unlike, for instance, 579 00:21:08,660 --> 00:21:10,459 vol grind that just looks at the 580 00:21:10,460 --> 00:21:12,199 assembler level at memory loads and 581 00:21:12,200 --> 00:21:13,189 stories. 582 00:21:13,190 --> 00:21:15,469 And if we compile a program 583 00:21:15,470 --> 00:21:16,939 and if you look at an intermediate stage, 584 00:21:16,940 --> 00:21:19,219 we still know what 585 00:21:19,220 --> 00:21:21,409 variables we access are pointers, 586 00:21:21,410 --> 00:21:22,399 which are integers. 587 00:21:22,400 --> 00:21:24,419 So we can drop a lot of the load and 588 00:21:24,420 --> 00:21:26,329 statistics and only look at those checks 589 00:21:26,330 --> 00:21:28,189 which are actually relevant to the 590 00:21:28,190 --> 00:21:29,190 problem. 591 00:21:29,930 --> 00:21:32,239 So that that sounded 592 00:21:32,240 --> 00:21:34,399 on paper like a very promising project 593 00:21:34,400 --> 00:21:35,900 and I started looking into it. 594 00:21:38,780 --> 00:21:41,149 And yep, 595 00:21:41,150 --> 00:21:43,699 so the advantages 596 00:21:43,700 --> 00:21:46,189 you get, source compatibility, 597 00:21:46,190 --> 00:21:48,649 some of the competing projects 598 00:21:48,650 --> 00:21:50,809 you had to start modifying your C-code in 599 00:21:50,810 --> 00:21:53,299 order to get it to compile with a tool, 600 00:21:53,300 --> 00:21:55,669 it gets complete coverage, actually 601 00:21:55,670 --> 00:21:57,859 gets 100 percent of your 602 00:21:57,860 --> 00:22:00,109 memory safety problems and catches 603 00:22:00,110 --> 00:22:01,189 them. 604 00:22:01,190 --> 00:22:03,289 It supports separate compilations so you 605 00:22:03,290 --> 00:22:05,569 can compile a library 606 00:22:05,570 --> 00:22:07,939 and link it against your main executable 607 00:22:07,940 --> 00:22:09,949 like you're used to instead of having to 608 00:22:09,950 --> 00:22:12,049 compile all your second one. 609 00:22:12,050 --> 00:22:14,239 And it's got a lot overhead. 610 00:22:14,240 --> 00:22:16,399 So at the end, something like 611 00:22:16,400 --> 00:22:18,949 hundred percent of makes your code 612 00:22:18,950 --> 00:22:21,049 half of half as fast as it used 613 00:22:21,050 --> 00:22:22,039 to be. 614 00:22:22,040 --> 00:22:24,349 But, you know, that sounds 615 00:22:24,350 --> 00:22:25,879 like much of your C program. 616 00:22:25,880 --> 00:22:27,679 And somebody says, like, I have that 617 00:22:27,680 --> 00:22:29,359 optimization that makes the program three 618 00:22:29,360 --> 00:22:31,699 percent positive a while, then you can 619 00:22:31,700 --> 00:22:34,369 say, well, and I'll make it half of 620 00:22:34,370 --> 00:22:35,809 what. 621 00:22:35,810 --> 00:22:37,969 But then you remember 622 00:22:37,970 --> 00:22:40,099 that people actually use Ruby to serve 623 00:22:40,100 --> 00:22:41,239 the pages of their. 624 00:22:47,210 --> 00:22:49,309 So I can imagine quite a 625 00:22:49,310 --> 00:22:51,619 lot of applications there, a 100 626 00:22:51,620 --> 00:22:53,839 percent performance overhead 627 00:22:53,840 --> 00:22:56,389 is really, really cheap compared 628 00:22:56,390 --> 00:22:59,029 to getting pulled by the NSA or somebody. 629 00:22:59,030 --> 00:23:01,189 So I thought 630 00:23:01,190 --> 00:23:03,469 it was a worthwhile project. 631 00:23:03,470 --> 00:23:06,199 So what are they doing? 632 00:23:06,200 --> 00:23:08,329 Essentially, again, blek is 633 00:23:08,330 --> 00:23:10,849 what the what the user rights rattus 634 00:23:10,850 --> 00:23:12,679 what the tools inserting. 635 00:23:12,680 --> 00:23:14,659 If you're looking at C-code here, which 636 00:23:14,660 --> 00:23:16,999 you have to think about as a 637 00:23:17,000 --> 00:23:18,559 high level representation of the 638 00:23:18,560 --> 00:23:20,059 intermediate representation it's actually 639 00:23:20,060 --> 00:23:21,169 working on. 640 00:23:21,170 --> 00:23:23,329 And you see there, that's 641 00:23:23,330 --> 00:23:25,519 a lot from a point of there, the check 642 00:23:25,520 --> 00:23:28,219 for a store would be totally equivalent 643 00:23:28,220 --> 00:23:30,169 and all it does is go through the 644 00:23:30,170 --> 00:23:32,539 intermediate representation of your code 645 00:23:32,540 --> 00:23:34,609 and for every person to access 646 00:23:34,610 --> 00:23:36,709 it inserts a check for the 647 00:23:36,710 --> 00:23:38,779 pointer whether the base and the 648 00:23:38,780 --> 00:23:41,399 bond are in the right range. 649 00:23:41,400 --> 00:23:43,639 It involves the access the 650 00:23:43,640 --> 00:23:45,799 size of the access type, because it might 651 00:23:45,800 --> 00:23:47,959 it might be the case that your point is, 652 00:23:47,960 --> 00:23:50,089 you know, just pointing to the end 653 00:23:50,090 --> 00:23:52,099 of the structure, but still inside. 654 00:23:52,100 --> 00:23:53,959 But if you read the whole word, then a 655 00:23:53,960 --> 00:23:56,059 few bytes will spill over. 656 00:23:56,060 --> 00:23:58,219 And yeah, one bite 657 00:23:58,220 --> 00:23:59,989 of buffer overflow are sufficient. 658 00:23:59,990 --> 00:24:02,749 I've seen a presentation 659 00:24:02,750 --> 00:24:05,629 also at CES to see Congress where 660 00:24:05,630 --> 00:24:07,939 an exploit was written 661 00:24:07,940 --> 00:24:10,159 for a one byte overflow because the one 662 00:24:10,160 --> 00:24:12,469 byte overflow flew right to the base 663 00:24:12,470 --> 00:24:14,989 pointer so one could create a 664 00:24:14,990 --> 00:24:17,059 copy of the stack, frame a few bytes 665 00:24:17,060 --> 00:24:19,219 of inside the buffer and then 666 00:24:19,220 --> 00:24:21,199 part of that so that the size of things 667 00:24:21,200 --> 00:24:22,200 still plays a role. 668 00:24:23,390 --> 00:24:26,089 And the implementation 669 00:24:26,090 --> 00:24:28,010 of that check is pretty straightforward. 670 00:24:29,980 --> 00:24:30,980 OK. 671 00:24:32,680 --> 00:24:33,680 FIFA has been laughing. 672 00:24:35,930 --> 00:24:38,989 Do I hear more laughs Who's laughing 673 00:24:38,990 --> 00:24:39,889 if you're not laughing? 674 00:24:39,890 --> 00:24:41,899 I'm not trusting your C-code. 675 00:24:41,900 --> 00:24:44,119 If you don't see that you're not 676 00:24:44,120 --> 00:24:47,149 supposed to ride security critical C-code 677 00:24:47,150 --> 00:24:48,150 because, 678 00:24:49,820 --> 00:24:51,889 you know, they have that of proof of 679 00:24:51,890 --> 00:24:53,569 correctness, of their algorithm and 680 00:24:53,570 --> 00:24:55,519 everything, but then they menu. 681 00:24:55,520 --> 00:24:57,649 So they insert all the right checks to 682 00:24:57,650 --> 00:24:59,779 find 100 percent of the buffer of flaws. 683 00:24:59,780 --> 00:25:01,459 But then they get the check wrong. 684 00:25:01,460 --> 00:25:02,480 What's wrong with that check? 685 00:25:03,650 --> 00:25:05,729 You know, the addition of 686 00:25:05,730 --> 00:25:08,089 pointer plus size might overflow. 687 00:25:10,410 --> 00:25:11,410 So you have to check for that. 688 00:25:12,450 --> 00:25:14,099 I wrote to the authors and they said, 689 00:25:14,100 --> 00:25:16,319 Yeah, yeah, you know, and that's academic 690 00:25:16,320 --> 00:25:17,339 research we're doing here. 691 00:25:17,340 --> 00:25:18,749 And it's nice that somebody looks at 692 00:25:18,750 --> 00:25:19,750 actual. 693 00:25:22,630 --> 00:25:24,879 But other than that, it's a fine 694 00:25:24,880 --> 00:25:26,589 piece of software, so the research part 695 00:25:26,590 --> 00:25:28,269 they're doing is quite good and I think 696 00:25:28,270 --> 00:25:30,399 we should do more projects for 697 00:25:30,400 --> 00:25:32,589 hackers, look into research out there 698 00:25:32,590 --> 00:25:34,879 and try to liberate it into 699 00:25:34,880 --> 00:25:36,969 into real world, existing open source 700 00:25:36,970 --> 00:25:37,990 software, for instance, 701 00:25:39,400 --> 00:25:40,400 and 702 00:25:41,500 --> 00:25:42,999 the base of the bond value. 703 00:25:43,000 --> 00:25:44,889 They have to come from somewhere. 704 00:25:44,890 --> 00:25:46,959 That's the point I talked about. 705 00:25:46,960 --> 00:25:49,329 So we need to calculate them on 706 00:25:49,330 --> 00:25:51,189 memory allocation. 707 00:25:51,190 --> 00:25:53,259 So in the case of Marlock, no 708 00:25:53,260 --> 00:25:55,359 other point to address is the base 709 00:25:55,360 --> 00:25:57,489 and the bond is the point to address, 710 00:25:57,490 --> 00:25:59,319 plus the size that was requested to 711 00:25:59,320 --> 00:26:00,489 Marlock. 712 00:26:00,490 --> 00:26:02,569 And you know, if if my 713 00:26:02,570 --> 00:26:04,629 returns null, then of course a bond is 714 00:26:04,630 --> 00:26:06,749 also null if you have 715 00:26:06,750 --> 00:26:08,559 an outpoint against a serious size and 716 00:26:08,560 --> 00:26:09,739 the internal representation. 717 00:26:09,740 --> 00:26:11,859 So the check fails because the 718 00:26:11,860 --> 00:26:13,929 memory of the memory exists will 719 00:26:13,930 --> 00:26:16,029 always overwrite the zero size 720 00:26:16,030 --> 00:26:17,019 buffer. 721 00:26:17,020 --> 00:26:18,480 So very easy here. 722 00:26:20,590 --> 00:26:22,779 And second location 723 00:26:23,860 --> 00:26:26,199 pretty much works the same base address 724 00:26:26,200 --> 00:26:28,749 as the address of the area, 725 00:26:28,750 --> 00:26:30,819 as Bolland as the base plus the size of 726 00:26:30,820 --> 00:26:32,980 that area, very, very straightforward. 727 00:26:36,270 --> 00:26:37,889 This is very tricky, is tricky, and this 728 00:26:37,890 --> 00:26:40,499 is why it's hard to come up with a good 729 00:26:40,500 --> 00:26:43,019 memory solution for Forese, 730 00:26:43,020 --> 00:26:45,629 you have point to arithmetic 731 00:26:45,630 --> 00:26:48,089 and essentially 732 00:26:48,090 --> 00:26:50,159 what you do if your new pointer 733 00:26:50,160 --> 00:26:52,289 is a pointer plus an index, 734 00:26:52,290 --> 00:26:54,029 then you point out copies of the base 735 00:26:54,030 --> 00:26:56,129 address of the original pointer and it 736 00:26:56,130 --> 00:26:57,959 copies the bond address of the original 737 00:26:57,960 --> 00:27:00,059 point. And you will notice that 738 00:27:00,060 --> 00:27:02,189 inside that representation 739 00:27:02,190 --> 00:27:04,559 and the pointer 740 00:27:04,560 --> 00:27:06,509 might point outside the object 741 00:27:06,510 --> 00:27:07,510 temporarily. 742 00:27:08,340 --> 00:27:09,599 But every time you do point out 743 00:27:09,600 --> 00:27:11,379 arithmetic, it's a computer. 744 00:27:11,380 --> 00:27:13,649 So after the next, Edvin might be back 745 00:27:13,650 --> 00:27:15,749 into the area and everything's 746 00:27:15,750 --> 00:27:17,970 fine and. 747 00:27:21,640 --> 00:27:24,649 He is a special case, and 748 00:27:24,650 --> 00:27:26,569 that's the case I've talked about that 749 00:27:26,570 --> 00:27:28,819 you get the address of a member 750 00:27:28,820 --> 00:27:30,889 of a structure and access 751 00:27:30,890 --> 00:27:33,049 that. And essentially what you do is 752 00:27:33,050 --> 00:27:34,699 you narrow your bones to that member of 753 00:27:34,700 --> 00:27:37,069 that structure and 754 00:27:37,070 --> 00:27:39,169 you might end up with a couple of 755 00:27:39,170 --> 00:27:41,239 false positives there 756 00:27:41,240 --> 00:27:43,399 because, you know, somebody might think 757 00:27:43,400 --> 00:27:45,709 it's smart to you know, 758 00:27:45,710 --> 00:27:47,689 I've writing a program that deals with 759 00:27:47,690 --> 00:27:49,789 fleeted graphics, have a strike that has 760 00:27:49,790 --> 00:27:51,859 an X and Y and Z as 761 00:27:51,860 --> 00:27:54,169 members, and then the whole structure 762 00:27:54,170 --> 00:27:56,509 is passed to some other function 763 00:27:56,510 --> 00:27:58,489 that interprets the very same structure 764 00:27:58,490 --> 00:28:00,499 as an array of floats instead of a 765 00:28:00,500 --> 00:28:02,209 structure float with three members 766 00:28:02,210 --> 00:28:03,739 because the memory representation is the 767 00:28:03,740 --> 00:28:06,279 same at something we cannot tell apart. 768 00:28:06,280 --> 00:28:08,689 So this will prevent overruns 769 00:28:08,690 --> 00:28:09,689 inside a structure. 770 00:28:09,690 --> 00:28:11,899 It will prevent certain 771 00:28:11,900 --> 00:28:14,269 kinds of dirty and you'll see a program 772 00:28:14,270 --> 00:28:16,219 if you decide to do them. 773 00:28:16,220 --> 00:28:18,050 So something you can liveth. 774 00:28:20,390 --> 00:28:22,099 Now, the case of narrowing is when you 775 00:28:22,100 --> 00:28:24,139 access an area inside the structure 776 00:28:24,140 --> 00:28:25,759 instead of just an inch, but it's 777 00:28:27,170 --> 00:28:28,790 boils down to the same principle. 778 00:28:32,020 --> 00:28:33,459 This is where it gets interesting. 779 00:28:35,920 --> 00:28:38,019 We know that our point does now 780 00:28:38,020 --> 00:28:39,969 have three values as the point of values 781 00:28:39,970 --> 00:28:41,439 based on the bond. 782 00:28:41,440 --> 00:28:43,539 And we also know that we do not want 783 00:28:43,540 --> 00:28:45,699 to change representation of structures 784 00:28:45,700 --> 00:28:46,929 in memory. 785 00:28:46,930 --> 00:28:49,179 So if you have an object and memory 786 00:28:49,180 --> 00:28:51,969 and below the pointer from that address, 787 00:28:51,970 --> 00:28:53,979 we have to get our base and our bond from 788 00:28:53,980 --> 00:28:55,990 somewhere so opposed to the 789 00:28:57,040 --> 00:28:59,559 other Fairpoint implementations, 790 00:28:59,560 --> 00:29:02,469 soft Ponseti as uses a 791 00:29:02,470 --> 00:29:04,449 shadow space, a data structure that keeps 792 00:29:04,450 --> 00:29:05,919 the copies around. 793 00:29:05,920 --> 00:29:08,199 So essentially what that does is 794 00:29:08,200 --> 00:29:10,929 a table look up based on the pointer 795 00:29:10,930 --> 00:29:13,030 and returning the base and bond address. 796 00:29:14,110 --> 00:29:15,169 That's something tricky here. 797 00:29:15,170 --> 00:29:17,379 You might not see that immediately. 798 00:29:17,380 --> 00:29:19,509 And we do not use the 799 00:29:19,510 --> 00:29:21,579 pointer value for the look 800 00:29:21,580 --> 00:29:23,169 of the base of the bond. 801 00:29:23,170 --> 00:29:25,809 We use the address of the pointer 802 00:29:25,810 --> 00:29:27,909 in memory, not as the two stars up 803 00:29:27,910 --> 00:29:29,829 there of using the address of the pointer 804 00:29:29,830 --> 00:29:32,409 and memory to get an index 805 00:29:32,410 --> 00:29:34,419 into our table to get the additional 806 00:29:34,420 --> 00:29:36,189 information. So instead of coupling 807 00:29:36,190 --> 00:29:38,259 something to the point of value, the 808 00:29:38,260 --> 00:29:40,449 copy some some additional information 809 00:29:40,450 --> 00:29:42,579 to the variable location at the end 810 00:29:42,580 --> 00:29:45,879 of the day and that we use to 811 00:29:45,880 --> 00:29:47,529 keep the base bound around. 812 00:29:48,700 --> 00:29:51,729 So, of course, inside the function 813 00:29:51,730 --> 00:29:53,259 that's just passed in additional 814 00:29:53,260 --> 00:29:54,249 registers. 815 00:29:54,250 --> 00:29:56,379 So there are a lot of variables that 816 00:29:56,380 --> 00:29:58,419 are generated during a translation of a 817 00:29:58,420 --> 00:29:59,979 function for the base of the bond. 818 00:29:59,980 --> 00:30:01,329 If there's a pointer. 819 00:30:01,330 --> 00:30:03,129 And this only applies when actually 820 00:30:03,130 --> 00:30:05,079 loading something from memory as opposed 821 00:30:05,080 --> 00:30:06,759 to getting a value that was an 822 00:30:06,760 --> 00:30:08,349 intermediate result of computation and 823 00:30:08,350 --> 00:30:10,449 such a function or 824 00:30:10,450 --> 00:30:12,160 something that was passed as a parameter. 825 00:30:15,340 --> 00:30:17,109 Storing metadata works the same whenever 826 00:30:17,110 --> 00:30:19,299 we write to a memory location 827 00:30:19,300 --> 00:30:20,769 that is a pointer and we know it's a 828 00:30:20,770 --> 00:30:22,239 pointer because we're working on a time 829 00:30:22,240 --> 00:30:23,529 to the immediate representation in the 830 00:30:23,530 --> 00:30:25,719 compiler, then we have 831 00:30:25,720 --> 00:30:26,900 to do that additional store. 832 00:30:28,120 --> 00:30:30,309 There are certain 833 00:30:30,310 --> 00:30:32,409 alternatives for the implementation 834 00:30:32,410 --> 00:30:34,479 of that store. So the thing 835 00:30:34,480 --> 00:30:35,919 abstracted behind the table, look up 836 00:30:35,920 --> 00:30:37,989 there, we could implement it 837 00:30:37,990 --> 00:30:38,990 as a hash table. 838 00:30:40,420 --> 00:30:42,099 As you can imagine, that's a very 839 00:30:42,100 --> 00:30:44,079 expensive operation to look into a hash 840 00:30:44,080 --> 00:30:45,969 table or items for a hash table every 841 00:30:45,970 --> 00:30:48,129 time you write or 2.0 from 842 00:30:48,130 --> 00:30:50,979 a to memory, and 843 00:30:50,980 --> 00:30:53,149 you could use an actual shadow 844 00:30:53,150 --> 00:30:54,400 space. So 845 00:30:55,680 --> 00:30:58,059 then you have a heap of, say, a size 846 00:30:58,060 --> 00:30:59,379 16 megabyte. 847 00:30:59,380 --> 00:31:01,449 You allocate another 16 megabyte 848 00:31:01,450 --> 00:31:03,879 for your best addresses, another 16 849 00:31:03,880 --> 00:31:06,219 megabyte for your bond addresses. 850 00:31:06,220 --> 00:31:08,259 And then it's a very simple operation, 851 00:31:08,260 --> 00:31:10,389 right? You do an arbitrary pointer 852 00:31:10,390 --> 00:31:12,129 address and you suddenly get the address 853 00:31:12,130 --> 00:31:13,969 of the base and the bond value. 854 00:31:13,970 --> 00:31:16,179 That's very fast, but it needs 855 00:31:16,180 --> 00:31:17,180 a lot of memory. 856 00:31:18,310 --> 00:31:20,559 And it turned out that 857 00:31:20,560 --> 00:31:22,629 after quite a lot of experimentation, 858 00:31:22,630 --> 00:31:24,999 that the optimum data structure for that 859 00:31:25,000 --> 00:31:27,489 is a a tree 860 00:31:27,490 --> 00:31:29,769 that's written tr i.e. 861 00:31:29,770 --> 00:31:31,839 some people call a tree or pronounce a 862 00:31:31,840 --> 00:31:33,909 tree. Some people call it erotics 863 00:31:33,910 --> 00:31:36,039 tree at at the 864 00:31:36,040 --> 00:31:38,139 end of the day, it works 865 00:31:38,140 --> 00:31:39,639 like a stable system. 866 00:31:39,640 --> 00:31:41,169 So you take a certain number of bits from 867 00:31:41,170 --> 00:31:43,389 a pointer address and that points 868 00:31:43,390 --> 00:31:45,579 to a secondary table that is just, 869 00:31:45,580 --> 00:31:47,679 you know, a page of values 870 00:31:47,680 --> 00:31:49,809 and that turns out to be sufficiently 871 00:31:49,810 --> 00:31:52,059 memory efficient to not be a problem, 872 00:31:52,060 --> 00:31:53,679 but also has to exist. 873 00:31:53,680 --> 00:31:55,899 Because all you have to do is, you know, 874 00:31:55,900 --> 00:31:58,059 all the right things, shift things 875 00:31:58,060 --> 00:32:00,189 around to a point to look up and to 876 00:32:00,190 --> 00:32:01,640 get into table value. 877 00:32:03,400 --> 00:32:04,929 What it does, however, with the 878 00:32:04,930 --> 00:32:07,239 performance is that these 879 00:32:07,240 --> 00:32:09,279 additional loads and stories. 880 00:32:09,280 --> 00:32:11,829 So from a performance point of view, 881 00:32:11,830 --> 00:32:13,239 you might think that the actual 882 00:32:13,240 --> 00:32:15,279 computation for checking the balance is 883 00:32:15,280 --> 00:32:16,719 what makes the program slow. 884 00:32:16,720 --> 00:32:18,429 And that's absolutely, totally not the 885 00:32:18,430 --> 00:32:19,689 case. If you think that you haven't 886 00:32:19,690 --> 00:32:22,239 looked into modern CPU architectures 887 00:32:22,240 --> 00:32:24,309 and in a modern 888 00:32:24,310 --> 00:32:26,679 computer, the first time to memory 889 00:32:26,680 --> 00:32:28,389 dominates everything that you do on the 890 00:32:28,390 --> 00:32:30,639 CPU. So it's not unusual to have 891 00:32:30,640 --> 00:32:32,979 a 180 cycle round trip 892 00:32:32,980 --> 00:32:35,019 for just loading a vote from RAM if it's 893 00:32:35,020 --> 00:32:37,149 not in cash and all the Whitesville your 894 00:32:37,150 --> 00:32:39,339 thumbs. Also, you have 895 00:32:39,340 --> 00:32:42,099 a super scale execution on modern space 896 00:32:42,100 --> 00:32:44,139 and the kind of check that's introduced 897 00:32:44,140 --> 00:32:46,209 for the bonus check is something that's 898 00:32:46,210 --> 00:32:48,249 very, very easily parallel per 899 00:32:48,250 --> 00:32:50,559 realizable. So it 900 00:32:50,560 --> 00:32:53,049 increases the parallelism 901 00:32:53,050 --> 00:32:53,649 of your software. 902 00:32:53,650 --> 00:32:55,389 It's more instructions that can be run 903 00:32:55,390 --> 00:32:57,489 the same time so the actual check 904 00:32:57,490 --> 00:32:59,829 can be distributed onto different 905 00:32:59,830 --> 00:33:02,229 instruction units inside your CPU. 906 00:33:02,230 --> 00:33:04,449 So that tends to take 907 00:33:04,450 --> 00:33:06,609 almost no time at all, the actual 908 00:33:06,610 --> 00:33:07,779 overhead you're paying. 909 00:33:07,780 --> 00:33:10,089 As for the extra loads and stores 910 00:33:10,090 --> 00:33:11,739 that are done to memory here for the 911 00:33:11,740 --> 00:33:13,959 structure access, and 912 00:33:13,960 --> 00:33:16,239 that's what the overhead comes from, one 913 00:33:16,240 --> 00:33:17,240 percent overhead 914 00:33:18,490 --> 00:33:20,979 and in order to prevent 915 00:33:20,980 --> 00:33:23,649 loading and storing from those, 916 00:33:23,650 --> 00:33:26,289 whatever it is, hash table, Radack's tree 917 00:33:26,290 --> 00:33:27,730 shadow memory all of the time, 918 00:33:28,990 --> 00:33:30,769 an additional modification. 919 00:33:30,770 --> 00:33:33,379 That soft bond does 920 00:33:33,380 --> 00:33:35,729 is modifying the calling convention, 921 00:33:35,730 --> 00:33:37,849 so this year is 922 00:33:37,850 --> 00:33:39,379 an illustration. 923 00:33:39,380 --> 00:33:40,380 Think about it of 924 00:33:41,660 --> 00:33:43,109 Xavier for shlock. 925 00:33:43,110 --> 00:33:45,199 It doesn't work like that internally, 926 00:33:45,200 --> 00:33:47,299 but it internally does is it keeps 927 00:33:47,300 --> 00:33:48,589 a so-called shadow stack 928 00:33:50,240 --> 00:33:52,509 for the additional base and bond values 929 00:33:52,510 --> 00:33:54,649 they pass in 1.0 like the 930 00:33:54,650 --> 00:33:56,359 S and the function there, and then 931 00:33:56,360 --> 00:33:59,089 industrial as base and as bond is passed 932 00:33:59,090 --> 00:34:00,859 into the function body. 933 00:34:00,860 --> 00:34:02,989 Actually, that resides on a different 934 00:34:02,990 --> 00:34:04,220 stack from the main stack. 935 00:34:05,360 --> 00:34:07,249 So in order not to mess up stack layout 936 00:34:07,250 --> 00:34:08,839 and everything that's involved in there 937 00:34:08,840 --> 00:34:11,178 and to improve compatibility, 938 00:34:11,179 --> 00:34:12,769 but yeah, so we augment that. 939 00:34:12,770 --> 00:34:15,049 So usually with 940 00:34:15,050 --> 00:34:16,519 the additional information, the base and 941 00:34:16,520 --> 00:34:18,649 bond information are passed around 942 00:34:18,650 --> 00:34:20,959 like additional function 943 00:34:20,960 --> 00:34:23,029 parameters that passed around 944 00:34:23,030 --> 00:34:25,549 inside registers, inside the functions 945 00:34:25,550 --> 00:34:27,709 and excellent and stuff from 946 00:34:27,710 --> 00:34:29,599 until memory only happens if the update 947 00:34:29,600 --> 00:34:30,600 or objects. 948 00:34:32,790 --> 00:34:34,919 Yeah, and they're a couple 949 00:34:34,920 --> 00:34:36,718 of loose ends, things that need to be 950 00:34:36,719 --> 00:34:38,279 treated here. 951 00:34:38,280 --> 00:34:40,019 Global variables for global variables, 952 00:34:40,020 --> 00:34:41,579 there's a path that 953 00:34:43,679 --> 00:34:45,928 hooks into the program load 954 00:34:45,929 --> 00:34:47,988 and using 955 00:34:47,989 --> 00:34:50,579 the intersection and LTH and 956 00:34:50,580 --> 00:34:52,698 that generates the best information for 957 00:34:52,699 --> 00:34:54,629 all the globalists so they can be 958 00:34:54,630 --> 00:34:56,600 accessible and. 959 00:34:58,510 --> 00:35:01,079 A separate compilation 960 00:35:01,080 --> 00:35:03,599 is supported by defending and API 961 00:35:03,600 --> 00:35:05,909 to crawl through that 962 00:35:05,910 --> 00:35:06,509 many copy. 963 00:35:06,510 --> 00:35:08,789 Interesting, because if you do a copy 964 00:35:08,790 --> 00:35:11,309 of Memory and Aid to memory and be 965 00:35:11,310 --> 00:35:12,959 inside that memory, there might be 966 00:35:12,960 --> 00:35:14,969 pointers. And the point ATM information 967 00:35:14,970 --> 00:35:16,889 is the base and bond also needs to be 968 00:35:16,890 --> 00:35:19,019 copied. So you need to have a special and 969 00:35:19,020 --> 00:35:21,149 then copy implementation that 970 00:35:21,150 --> 00:35:23,279 makes sure that the additional 971 00:35:23,280 --> 00:35:26,189 basic information is also popular around 972 00:35:26,190 --> 00:35:27,839 function point. So interesting function 973 00:35:27,840 --> 00:35:29,849 portus just get a bond of zero so you 974 00:35:29,850 --> 00:35:31,949 cannot write onto them and 975 00:35:31,950 --> 00:35:32,950 then they're safe. 976 00:35:34,680 --> 00:35:36,989 Casting, cussing, 977 00:35:36,990 --> 00:35:38,669 cussing is a problem if you have an 978 00:35:38,670 --> 00:35:40,559 integer and you create a pointer from it. 979 00:35:42,300 --> 00:35:44,699 At that point in time, the compiler 980 00:35:44,700 --> 00:35:46,559 has completely lost track, both the 981 00:35:46,560 --> 00:35:48,959 original type of 982 00:35:48,960 --> 00:35:51,479 the data that went into there. 983 00:35:51,480 --> 00:35:53,609 So essentially there's nothing 984 00:35:53,610 --> 00:35:55,769 sensible you can do there to prevent 985 00:35:55,770 --> 00:35:57,039 buffer overflows. 986 00:35:57,040 --> 00:35:59,099 So lots bond does is if 987 00:35:59,100 --> 00:36:01,289 you cast an integer to a pointer, it 988 00:36:01,290 --> 00:36:02,429 gets a bond of zero. 989 00:36:02,430 --> 00:36:04,119 So you cannot write to that memory. 990 00:36:04,120 --> 00:36:06,989 X completely prevents you from 991 00:36:06,990 --> 00:36:09,359 using integers as pointers, and 992 00:36:09,360 --> 00:36:10,619 that kills a couple of hacks. 993 00:36:10,620 --> 00:36:12,869 Some people of you might know the trick 994 00:36:12,870 --> 00:36:15,159 of using X or to X or two pointers 995 00:36:15,160 --> 00:36:17,249 onto each other to generate a 996 00:36:17,250 --> 00:36:19,299 doubly linked list with only one point of 997 00:36:19,300 --> 00:36:20,300 field. 998 00:36:20,790 --> 00:36:22,889 That doesn't work if you have that 999 00:36:22,890 --> 00:36:25,109 limitation. But such quote 1000 00:36:25,110 --> 00:36:26,999 is rare, fortunately, and you can touch 1001 00:36:27,000 --> 00:36:29,279 the three lines that do that, do that 1002 00:36:29,280 --> 00:36:31,289 and casts and unions 1003 00:36:32,430 --> 00:36:34,769 there just need to be treated right like 1004 00:36:34,770 --> 00:36:36,149 at the moment where you access that 1005 00:36:36,150 --> 00:36:38,249 memory, the conversion 1006 00:36:38,250 --> 00:36:40,439 has to be applied. That works. 1007 00:36:40,440 --> 00:36:42,539 And the one thing that's still a bit 1008 00:36:42,540 --> 00:36:44,939 unsolved here is the arcs. 1009 00:36:44,940 --> 00:36:47,069 So that's not the truth as 1010 00:36:47,070 --> 00:36:48,719 they are not reaching 100 percent yet. 1011 00:36:48,720 --> 00:36:51,149 Vioxx are not treated yet 1012 00:36:51,150 --> 00:36:53,819 because they're special. 1013 00:36:53,820 --> 00:36:55,919 You would need to add a base in the 1014 00:36:55,920 --> 00:36:58,169 bound information to the area containing 1015 00:36:58,170 --> 00:37:00,599 your Vioxx, and 1016 00:37:00,600 --> 00:37:01,169 that's not there. 1017 00:37:01,170 --> 00:37:03,239 It needs to be implemented. 1018 00:37:03,240 --> 00:37:05,309 So 97 percent from 1019 00:37:05,310 --> 00:37:06,989 its results are not covered yet. 1020 00:37:08,760 --> 00:37:10,939 OK, onto temporary 1021 00:37:10,940 --> 00:37:11,940 location. 1022 00:37:12,600 --> 00:37:14,039 That's a bit quicker now because you 1023 00:37:14,040 --> 00:37:15,899 already know the principle of passing fad 1024 00:37:15,900 --> 00:37:16,900 pointers around. 1025 00:37:17,820 --> 00:37:20,069 What we do for temporal is that we have 1026 00:37:20,070 --> 00:37:21,569 two additional fields and our fat 1027 00:37:21,570 --> 00:37:23,759 pointer. One is a 1028 00:37:23,760 --> 00:37:26,069 key and one is a lock. 1029 00:37:26,070 --> 00:37:28,529 The lock is an address and memory 1030 00:37:28,530 --> 00:37:29,769 and the key is a value. 1031 00:37:29,770 --> 00:37:32,309 The increment every time you say malac. 1032 00:37:32,310 --> 00:37:33,869 So we know it's memory allocation. 1033 00:37:33,870 --> 00:37:35,099 Twenty three. 1034 00:37:35,100 --> 00:37:36,869 And then if you have an association 1035 00:37:36,870 --> 00:37:40,199 between a pointer and 1036 00:37:40,200 --> 00:37:42,359 the lock at the moment we're free 1037 00:37:42,360 --> 00:37:43,589 is called. 1038 00:37:43,590 --> 00:37:45,749 Um, the memory that holds 1039 00:37:45,750 --> 00:37:47,340 the lock is reset to zero. 1040 00:37:49,760 --> 00:37:51,979 I hope that will get obvious in the next 1041 00:37:51,980 --> 00:37:54,829 slide, so that's the check 1042 00:37:54,830 --> 00:37:56,779 that's introduced there. 1043 00:37:56,780 --> 00:37:59,089 I get the key and the address as part 1044 00:37:59,090 --> 00:38:00,709 of my Fed point of implementation and 1045 00:38:00,710 --> 00:38:03,949 shadow space and I'd 1046 00:38:03,950 --> 00:38:05,809 load from the lock address. 1047 00:38:05,810 --> 00:38:08,269 And if that's what the address matches 1048 00:38:08,270 --> 00:38:10,459 my key value, then 1049 00:38:10,460 --> 00:38:12,469 I didn't call free on that point. 1050 00:38:12,470 --> 00:38:14,689 Yet at the moment, very 1051 00:38:14,690 --> 00:38:15,690 call free 1052 00:38:17,270 --> 00:38:19,459 press free is free at the moment. 1053 00:38:19,460 --> 00:38:20,389 Very cold, free. 1054 00:38:20,390 --> 00:38:22,759 All I do is go to my address 1055 00:38:22,760 --> 00:38:23,899 and write invalid. 1056 00:38:23,900 --> 00:38:25,939 Look at that address. 1057 00:38:25,940 --> 00:38:28,159 So yeah, 1058 00:38:28,160 --> 00:38:30,349 as long as the key facts lock, my pointer 1059 00:38:30,350 --> 00:38:31,429 is valid. 1060 00:38:31,430 --> 00:38:33,529 So far, every location I remember 1061 00:38:33,530 --> 00:38:34,819 the correct 1062 00:38:36,200 --> 00:38:37,200 lock value. 1063 00:38:38,870 --> 00:38:40,969 And then propagation of metadata 1064 00:38:40,970 --> 00:38:43,309 pretty much works again, like the special 1065 00:38:43,310 --> 00:38:45,409 checks, you've seen that 1066 00:38:45,410 --> 00:38:46,410 all. 1067 00:38:48,530 --> 00:38:49,530 Lowson stories 1068 00:38:51,230 --> 00:38:53,389 Glovers cannot be freed, so we 1069 00:38:53,390 --> 00:38:55,489 just introduce a key which is 1070 00:38:55,490 --> 00:38:57,649 global key and a global lock address that 1071 00:38:57,650 --> 00:38:58,979 always match to each other. 1072 00:38:58,980 --> 00:39:00,559 So when a global is access, we have a key 1073 00:39:00,560 --> 00:39:01,560 unlock that work. 1074 00:39:02,950 --> 00:39:05,109 Yeah, and that will be all 1075 00:39:05,110 --> 00:39:07,179 nice and fine, unless we wouldn't have 1076 00:39:07,180 --> 00:39:09,739 Freds, unfortunately, we do. 1077 00:39:09,740 --> 00:39:11,889 So that's again where 1078 00:39:11,890 --> 00:39:13,959 we don't reach the 100 percent yet. 1079 00:39:15,040 --> 00:39:16,629 If you have shared state of your threats, 1080 00:39:16,630 --> 00:39:18,969 if you have no and 1081 00:39:18,970 --> 00:39:21,639 it might happen that, you know, one 1082 00:39:21,640 --> 00:39:23,799 threat calls free 1083 00:39:23,800 --> 00:39:25,929 and the other one checks whether the lock 1084 00:39:25,930 --> 00:39:28,359 is still valid and you have the 1085 00:39:28,360 --> 00:39:30,099 process switch in between. 1086 00:39:30,100 --> 00:39:31,899 So one process is, yeah, the lock is 1087 00:39:31,900 --> 00:39:33,699 still valid, then the other one gets 1088 00:39:33,700 --> 00:39:36,199 scheduled, does the free kills the lock, 1089 00:39:36,200 --> 00:39:37,929 and then you're getting back to the first 1090 00:39:37,930 --> 00:39:40,029 threat, which then on initialized 1091 00:39:40,030 --> 00:39:41,019 memory. 1092 00:39:41,020 --> 00:39:42,340 And we do not want to have that. 1093 00:39:44,100 --> 00:39:46,709 So on contribution, 1094 00:39:46,710 --> 00:39:48,179 I'm not just telling you what people out 1095 00:39:48,180 --> 00:39:49,889 there did, I actually did my own checking 1096 00:39:49,890 --> 00:39:50,890 on that. 1097 00:39:51,300 --> 00:39:53,369 And at the moment, the 1098 00:39:53,370 --> 00:39:55,439 form of soft porn sites 1099 00:39:55,440 --> 00:39:57,509 you're getting is a collection 1100 00:39:57,510 --> 00:39:59,759 of patches to LVM that their own 1101 00:39:59,760 --> 00:40:01,259 top level executable. 1102 00:40:01,260 --> 00:40:03,389 And all that does is processing a lot 1103 00:40:03,390 --> 00:40:04,849 of damage to representation. 1104 00:40:05,910 --> 00:40:08,279 And it also has a number of other nasty 1105 00:40:08,280 --> 00:40:10,529 hacks, like in the compiler module. 1106 00:40:10,530 --> 00:40:12,749 It keeps a list of Lipsey 1107 00:40:12,750 --> 00:40:14,999 functions. And then for every function 1108 00:40:15,000 --> 00:40:17,249 Lipsey, it comes with its own wrapper 1109 00:40:17,250 --> 00:40:19,319 function that need to be read. 1110 00:40:19,320 --> 00:40:21,539 When you want to call in there 1111 00:40:21,540 --> 00:40:23,639 and started 1112 00:40:23,640 --> 00:40:26,169 around taking that in, Assunta 1113 00:40:26,170 --> 00:40:27,629 turned out to be intractable. 1114 00:40:27,630 --> 00:40:29,129 Just cover interesting things when you do 1115 00:40:29,130 --> 00:40:31,229 that, like, for instance, that the 1116 00:40:31,230 --> 00:40:33,429 Linux headers and they call 1117 00:40:33,430 --> 00:40:35,339 different functions depending on whether 1118 00:40:35,340 --> 00:40:37,349 you enable the optimizer or not. 1119 00:40:37,350 --> 00:40:39,539 They actually have an if Dev did the 1120 00:40:39,540 --> 00:40:41,639 user use optimization inside 1121 00:40:41,640 --> 00:40:43,619 that function or not, then use one 1122 00:40:43,620 --> 00:40:45,329 function or else use the other function 1123 00:40:45,330 --> 00:40:47,399 and you end up with a ton of 1124 00:40:47,400 --> 00:40:49,499 Lipsy internal functions that you need to 1125 00:40:49,500 --> 00:40:51,329 read and then recompile the compiler and 1126 00:40:51,330 --> 00:40:53,039 work on your code. And that all led to 1127 00:40:53,040 --> 00:40:54,119 nowhere. 1128 00:40:54,120 --> 00:40:56,339 So I 1129 00:40:56,340 --> 00:40:58,649 found out that freebees, Dean 1130 00:40:58,650 --> 00:41:01,889 actually is compiler will using LVM 1131 00:41:01,890 --> 00:41:04,349 and chose 1132 00:41:04,350 --> 00:41:06,629 that as a target for further hacking. 1133 00:41:06,630 --> 00:41:08,849 And what 1134 00:41:08,850 --> 00:41:10,979 what I did with a lot of support 1135 00:41:10,980 --> 00:41:12,599 from Hana's may not have worked on that 1136 00:41:12,600 --> 00:41:14,789 with me. He was a Phoebes domain expert 1137 00:41:14,790 --> 00:41:15,869 and also coffee guru. 1138 00:41:15,870 --> 00:41:17,579 Check out his coffee on the fruitfly is 1139 00:41:17,580 --> 00:41:18,870 good and 1140 00:41:21,870 --> 00:41:23,849 I can compile the whole user learned 1141 00:41:23,850 --> 00:41:25,919 using LVM from scratch with every 1142 00:41:25,920 --> 00:41:27,809 single line of code, and that's a nice 1143 00:41:27,810 --> 00:41:29,279 spot to actually start taking on that 1144 00:41:29,280 --> 00:41:30,359 stuff. 1145 00:41:30,360 --> 00:41:32,369 So what I did was introduce function 1146 00:41:32,370 --> 00:41:34,799 attributes that you can specify that turn 1147 00:41:34,800 --> 00:41:37,229 on or turn off this off-balance-sheet 1148 00:41:37,230 --> 00:41:39,569 as processing for a function. 1149 00:41:39,570 --> 00:41:41,669 And the other one is so 1150 00:41:41,670 --> 00:41:43,199 one is for saying that's a native 1151 00:41:43,200 --> 00:41:44,819 function like this says call don't do 1152 00:41:44,820 --> 00:41:47,099 anything. It's just a C function. 1153 00:41:47,100 --> 00:41:48,449 The other one is like, 1154 00:41:49,530 --> 00:41:50,999 that's a string copy function. 1155 00:41:51,000 --> 00:41:53,129 And you might want to give me your 1156 00:41:53,130 --> 00:41:55,169 baseline bound for the strings you pass, 1157 00:41:55,170 --> 00:41:57,059 but I want to write the check myself in 1158 00:41:57,060 --> 00:41:58,889 order not to check every single byte in 1159 00:41:58,890 --> 00:42:00,599 the loop for the performance that the 1160 00:42:00,600 --> 00:42:02,999 other attribute and 1161 00:42:03,000 --> 00:42:05,159 the part of the software and CTS module 1162 00:42:05,160 --> 00:42:07,319 to freebees. The event 1163 00:42:07,320 --> 00:42:09,559 essentially involved a lot of 1164 00:42:09,560 --> 00:42:11,999 Fakhoury with the built system 1165 00:42:12,000 --> 00:42:13,439 in order to get the additional modules 1166 00:42:13,440 --> 00:42:14,519 built. 1167 00:42:14,520 --> 00:42:16,709 And then we had to walk through 1168 00:42:16,710 --> 00:42:17,729 these start ups. 1169 00:42:17,730 --> 00:42:19,739 Everything that happens between Unaskable 1170 00:42:19,740 --> 00:42:22,229 Start, which is the symbol entry into 1171 00:42:22,230 --> 00:42:24,569 the executable binary 1172 00:42:24,570 --> 00:42:26,909 up until May, which is your 1173 00:42:26,910 --> 00:42:28,899 main entry as a C program. 1174 00:42:28,900 --> 00:42:30,449 A lot of things happen there. 1175 00:42:30,450 --> 00:42:32,099 Like for instance, all the constructors 1176 00:42:32,100 --> 00:42:34,199 are called Fred. 1177 00:42:34,200 --> 00:42:35,959 Local storage is initialized, 1178 00:42:37,020 --> 00:42:39,419 Marlock is initialized, and 1179 00:42:39,420 --> 00:42:40,769 that's all low level code. 1180 00:42:40,770 --> 00:42:42,899 So as you have seen, Marlock and Free 1181 00:42:42,900 --> 00:42:44,399 are essentially primitives and you have 1182 00:42:44,400 --> 00:42:46,589 Reppas around there that provide 1183 00:42:46,590 --> 00:42:49,379 the correct baseband lock and key values. 1184 00:42:49,380 --> 00:42:51,569 So we had to go 1185 00:42:51,570 --> 00:42:53,279 through that and find every single 1186 00:42:53,280 --> 00:42:55,049 function that's involved in program, 1187 00:42:55,050 --> 00:42:57,509 start up, annotate 1188 00:42:57,510 --> 00:42:59,399 that with the right attribute, come up 1189 00:42:59,400 --> 00:43:01,049 with the Marlock that does the right 1190 00:43:01,050 --> 00:43:03,269 thing, read the mallock to give 1191 00:43:03,270 --> 00:43:05,789 back the pointer values 1192 00:43:05,790 --> 00:43:07,949 and then we could 1193 00:43:07,950 --> 00:43:09,959 delete about 2000 lines of code in the 1194 00:43:09,960 --> 00:43:11,849 software seats because it was no longer 1195 00:43:11,850 --> 00:43:12,850 needed here. 1196 00:43:13,550 --> 00:43:16,019 And yeah. 1197 00:43:16,020 --> 00:43:18,179 And actually we're at 1198 00:43:18,180 --> 00:43:20,369 the point at the moment, they're 1199 00:43:20,370 --> 00:43:22,469 a hell overwrote executable can be 1200 00:43:22,470 --> 00:43:24,569 built and executed and will correctly 1201 00:43:24,570 --> 00:43:26,879 load and come up 1202 00:43:26,880 --> 00:43:28,889 to me. And then I have a little test 1203 00:43:28,890 --> 00:43:30,669 function that overrides the buffer and 1204 00:43:30,670 --> 00:43:32,729 the correctly detects the overflow of the 1205 00:43:32,730 --> 00:43:35,039 overflow happens and then the executable 1206 00:43:35,040 --> 00:43:37,469 is shut down correctly again. 1207 00:43:37,470 --> 00:43:38,470 So 1208 00:43:39,570 --> 00:43:41,189 I would call that a proof of concept. 1209 00:43:41,190 --> 00:43:42,359 It's a little bit better than the 1210 00:43:42,360 --> 00:43:44,279 academic code from a usability 1211 00:43:44,280 --> 00:43:45,209 perspective. 1212 00:43:45,210 --> 00:43:46,919 Not much from an industrial strength 1213 00:43:46,920 --> 00:43:49,739 perspective, but I think it's a very 1214 00:43:49,740 --> 00:43:51,089 promising approach to that. 1215 00:43:51,090 --> 00:43:52,889 So next steps and that will be 1216 00:43:54,420 --> 00:43:56,219 making that clean, making sure all the 1217 00:43:56,220 --> 00:43:58,439 functions are instrumented and work 1218 00:43:58,440 --> 00:44:00,689 and then start doing 1219 00:44:00,690 --> 00:44:03,029 tests, fix some things like enlightening, 1220 00:44:03,030 --> 00:44:05,239 for instance, so that the buffer 1221 00:44:05,240 --> 00:44:06,989 of Lautrec you've seen is not in line due 1222 00:44:06,990 --> 00:44:08,339 to the way it works. 1223 00:44:08,340 --> 00:44:10,439 And I think we 1224 00:44:10,440 --> 00:44:12,479 can get a lot of performance from there 1225 00:44:12,480 --> 00:44:14,820 by by enlargening stuff. 1226 00:44:15,960 --> 00:44:18,419 And the goal of the operation 1227 00:44:18,420 --> 00:44:20,369 is to have a complete previously worked 1228 00:44:20,370 --> 00:44:22,169 where every single line of code, except 1229 00:44:22,170 --> 00:44:23,639 for a little bit of trust of code, which 1230 00:44:23,640 --> 00:44:25,299 is essentially Mallock and the Lipsy 1231 00:44:25,300 --> 00:44:28,139 start up, is correctly 1232 00:44:28,140 --> 00:44:30,179 specially and temporarily bounced checked 1233 00:44:31,590 --> 00:44:33,839 because it would be really nice 1234 00:44:33,840 --> 00:44:35,969 to have a computer for a change that's 1235 00:44:35,970 --> 00:44:37,949 not vulnerable to Buffalo Flaws because I 1236 00:44:37,950 --> 00:44:39,839 didn't have any for the last twenty years 1237 00:44:39,840 --> 00:44:40,739 go through all the vendors. 1238 00:44:40,740 --> 00:44:41,740 Nobody gets it right. 1239 00:44:54,540 --> 00:44:55,540 So 1240 00:44:57,030 --> 00:44:59,519 let's link to the original source and 1241 00:44:59,520 --> 00:45:01,829 to get up with modifications, 1242 00:45:01,830 --> 00:45:04,199 I would like to thank once again for 1243 00:45:04,200 --> 00:45:06,209 making the code with me, would like to 1244 00:45:06,210 --> 00:45:08,479 thank the original softball and 1245 00:45:08,480 --> 00:45:10,529 of us for being very generous with 1246 00:45:10,530 --> 00:45:11,519 information. 1247 00:45:11,520 --> 00:45:13,799 One tip they gave me, the special 1248 00:45:13,800 --> 00:45:16,019 memory checking. They talked to Intel and 1249 00:45:16,020 --> 00:45:18,239 is working on a new 1250 00:45:18,240 --> 00:45:19,179 set extension. 1251 00:45:19,180 --> 00:45:21,299 So we will get native instructions that 1252 00:45:21,300 --> 00:45:23,129 do the bonus checking for us later. 1253 00:45:23,130 --> 00:45:24,539 It's called the extension. 1254 00:45:24,540 --> 00:45:27,179 Watch out for that and 1255 00:45:27,180 --> 00:45:29,639 generally for coming up with some awesome 1256 00:45:29,640 --> 00:45:30,539 research. 1257 00:45:30,540 --> 00:45:32,699 Yeah. So thanks to 1258 00:45:32,700 --> 00:45:34,049 you for for listening. 1259 00:45:34,050 --> 00:45:35,909 And I'm open for customers now. 1260 00:45:50,000 --> 00:45:51,000 Hello. 1261 00:45:52,020 --> 00:45:54,109 Did you know that Setpoint US is 1262 00:45:54,110 --> 00:45:56,209 not the invention 1263 00:45:56,210 --> 00:45:58,339 of the people of the Soft-Boiled Fit 1264 00:45:58,340 --> 00:46:00,399 City project is the word 1265 00:46:00,400 --> 00:46:02,539 to the world from the dark time 1266 00:46:02,540 --> 00:46:04,719 of replumbing on it 1267 00:46:04,720 --> 00:46:07,489 on 80 86 1268 00:46:07,490 --> 00:46:09,589 times when their heads more of the 1269 00:46:09,590 --> 00:46:12,019 same model of 1270 00:46:12,020 --> 00:46:14,089 the register, which you have to load 1271 00:46:14,090 --> 00:46:16,549 with, was a 64 came. 1272 00:46:16,550 --> 00:46:18,709 So this is exactly the borrowing 1273 00:46:18,710 --> 00:46:21,409 model of replumbing where no 1274 00:46:21,410 --> 00:46:23,119 unexplainably plumbing was able to 1275 00:46:23,120 --> 00:46:24,320 compile on Unix 1276 00:46:25,670 --> 00:46:27,199 and sort. 1277 00:46:27,200 --> 00:46:29,869 Sorry if I raised the 1278 00:46:29,870 --> 00:46:32,089 and the idea 1279 00:46:32,090 --> 00:46:34,249 that, you know, Soft-Boiled invented that 1280 00:46:34,250 --> 00:46:35,749 of course Fed Pontoise are all they're 1281 00:46:35,750 --> 00:46:37,489 even older than their 60s hardware. 1282 00:46:37,490 --> 00:46:39,589 Had it like you had bonds checked 1283 00:46:39,590 --> 00:46:41,329 point into X's and harkavy even in the 1284 00:46:41,330 --> 00:46:43,279 60s. So it's nothing new. 1285 00:46:43,280 --> 00:46:45,589 It's just that, you know, that particular 1286 00:46:45,590 --> 00:46:47,839 hech allows us to run realworld 1287 00:46:47,840 --> 00:46:50,359 existing C-code on real world hardware 1288 00:46:50,360 --> 00:46:51,259 these days. 1289 00:46:51,260 --> 00:46:53,030 So I think that's what makes it unique.