0 00:00:00,000 --> 00:00:30,000 Dear viewer, these subtitles were generated by a machine via the service Trint and therefore are (very) buggy. If you are capable, please help us to create good quality subtitles: https://c3subtitles.de/talk/871 Thanks! 1 00:00:15,650 --> 00:00:17,659 So for the next talk of today, we're 2 00:00:17,660 --> 00:00:20,419 going to talk about a new source 3 00:00:20,420 --> 00:00:23,419 DBI framework, which is called Kubi, the 4 00:00:23,420 --> 00:00:25,699 DBI may means dynamic 5 00:00:25,700 --> 00:00:27,379 binary instrumentation dynamics. 6 00:00:27,380 --> 00:00:29,659 Well, you know, binary also 7 00:00:29,660 --> 00:00:31,729 instrumentation means to observe, 8 00:00:31,730 --> 00:00:33,919 monitor and modify some 9 00:00:33,920 --> 00:00:36,559 parts of the some parts of a program 10 00:00:36,560 --> 00:00:37,789 or a library. 11 00:00:37,790 --> 00:00:40,609 And we have a speaker, Charles Shubane 12 00:00:40,610 --> 00:00:43,009 Sedrick this year, because 13 00:00:43,010 --> 00:00:46,279 they are both security researchers. 14 00:00:46,280 --> 00:00:48,949 Charles likes attacking Whitebook Crypto 15 00:00:48,950 --> 00:00:51,169 and collaborated with Graham 16 00:00:51,170 --> 00:00:52,129 and Sedrick. 17 00:00:52,130 --> 00:00:54,229 Instead, he's focused on reverse 18 00:00:54,230 --> 00:00:56,719 engineering and he likes whiskey. 19 00:00:56,720 --> 00:00:58,189 There you go. The stage is yours. 20 00:00:58,190 --> 00:00:59,209 Give them a hand, please. 21 00:01:05,250 --> 00:01:07,379 So I think the jurors did 22 00:01:07,380 --> 00:01:09,479 a good job at introducing us, 23 00:01:09,480 --> 00:01:11,639 so we both work at Quickstop, 24 00:01:11,640 --> 00:01:13,589 which is a French consulting security 25 00:01:13,590 --> 00:01:15,689 company, and during 26 00:01:15,690 --> 00:01:18,179 for our work we use a lot DBI. 27 00:01:18,180 --> 00:01:20,399 And for the past year and a half, we've 28 00:01:20,400 --> 00:01:22,859 been together researching DBI frameworks 29 00:01:22,860 --> 00:01:25,139 and also working on our own 30 00:01:25,140 --> 00:01:26,519 DBI framework. 31 00:01:26,520 --> 00:01:28,709 And what we want today is to try to 32 00:01:28,710 --> 00:01:30,859 demystify what DBI framework 33 00:01:30,860 --> 00:01:33,059 he's outworks and 34 00:01:33,060 --> 00:01:34,949 try to show you how we went about 35 00:01:34,950 --> 00:01:36,569 implementing our own DBI. 36 00:01:36,570 --> 00:01:39,119 And we hope to inspire 37 00:01:39,120 --> 00:01:41,429 you about the usage of Dibia framework 38 00:01:41,430 --> 00:01:43,619 and maybe even helping us 39 00:01:43,620 --> 00:01:44,620 on our own. 40 00:01:45,990 --> 00:01:48,179 So I'll start easy and start with 41 00:01:48,180 --> 00:01:49,979 an introduction of what is exactly 42 00:01:49,980 --> 00:01:51,689 instrumentation. 43 00:01:51,690 --> 00:01:54,059 So basically this 44 00:01:54,060 --> 00:01:56,249 is the transformation of a program into 45 00:01:56,250 --> 00:01:57,449 its own measurement tool. 46 00:01:57,450 --> 00:01:59,999 So you're making an instrument 47 00:02:00,000 --> 00:02:02,159 and this instrument, what you can do is 48 00:02:02,160 --> 00:02:03,989 that you can observe any state of the 49 00:02:03,990 --> 00:02:06,119 program any time during the 50 00:02:06,120 --> 00:02:07,499 runtime. 51 00:02:07,500 --> 00:02:10,019 And then after that, what you will 52 00:02:10,020 --> 00:02:11,729 the tool will automate the data 53 00:02:11,730 --> 00:02:13,379 collection and the processing of those 54 00:02:13,380 --> 00:02:14,609 data. 55 00:02:14,610 --> 00:02:16,949 So that's quite abstract. 56 00:02:16,950 --> 00:02:19,259 So what could you do with that kind 57 00:02:19,260 --> 00:02:20,969 of instrumentation? 58 00:02:20,970 --> 00:02:23,309 Well, I'm pretty sure most 59 00:02:23,310 --> 00:02:25,469 of you, if you program into C++, 60 00:02:25,470 --> 00:02:27,569 you already use vulgarized 61 00:02:27,570 --> 00:02:30,419 and maybe the magic tool and 62 00:02:30,420 --> 00:02:31,769 that's the background is in 63 00:02:31,770 --> 00:02:33,749 instrumentation framework and music is an 64 00:02:33,750 --> 00:02:35,069 instrumentation tool. 65 00:02:35,070 --> 00:02:37,229 And what it does, it's quite simple. 66 00:02:37,230 --> 00:02:39,189 It will track memory allocation and 67 00:02:39,190 --> 00:02:41,369 geolocation during the runtime and 68 00:02:41,370 --> 00:02:43,319 also track memory accesses. 69 00:02:43,320 --> 00:02:45,569 And with that information, it is able 70 00:02:45,570 --> 00:02:47,639 to both detect 71 00:02:47,640 --> 00:02:49,739 use after free memory leaks, the 72 00:02:49,740 --> 00:02:52,049 belfry, but also out of memory 73 00:02:52,050 --> 00:02:54,569 access and then can tell you when your 74 00:02:54,570 --> 00:02:56,639 you are accessing the area to buy it 75 00:02:56,640 --> 00:02:58,829 outside of the boundary that are 76 00:02:58,830 --> 00:03:00,659 that you've allocated at that place in 77 00:03:00,660 --> 00:03:01,660 the program. 78 00:03:03,600 --> 00:03:05,229 So it's quite useful. 79 00:03:05,230 --> 00:03:06,230 Another 80 00:03:07,770 --> 00:03:09,869 popular use case is fuzzing. 81 00:03:09,870 --> 00:03:12,299 So in fuzzing one of the 82 00:03:12,300 --> 00:03:13,649 first thing, you're generating random 83 00:03:13,650 --> 00:03:15,549 input and sending it to a program and try 84 00:03:15,550 --> 00:03:16,889 to make it crash. 85 00:03:16,890 --> 00:03:19,469 But one of the things you want to do is 86 00:03:19,470 --> 00:03:22,349 to generate interesting random inputs. 87 00:03:22,350 --> 00:03:24,509 So you want to know if 88 00:03:24,510 --> 00:03:26,669 between two random inputs, if the new 89 00:03:26,670 --> 00:03:28,770 one you're generating are 90 00:03:29,940 --> 00:03:32,339 are exploring more part of the code, 91 00:03:32,340 --> 00:03:34,319 are generating more state transition and 92 00:03:34,320 --> 00:03:35,339 things like that. 93 00:03:35,340 --> 00:03:37,679 So you want to measure the execution. 94 00:03:37,680 --> 00:03:39,689 And one of the main 95 00:03:40,740 --> 00:03:42,929 criteria you can measure is the measuring 96 00:03:42,930 --> 00:03:44,129 the cloud coverage. 97 00:03:44,130 --> 00:03:46,259 So DBI framework is a very good 98 00:03:46,260 --> 00:03:48,029 thing to measure cloud coverage. 99 00:03:48,030 --> 00:03:50,699 And if you use Winnefeld, basically 100 00:03:50,700 --> 00:03:53,489 it's AFL using a DPI framework 101 00:03:53,490 --> 00:03:55,799 which is interloping to gather 102 00:03:55,800 --> 00:03:58,229 good coverage informations. 103 00:03:58,230 --> 00:04:00,569 And if you go deeper, you can even use 104 00:04:00,570 --> 00:04:02,019 the information you have about what 105 00:04:02,020 --> 00:04:04,079 instructional executed to try to build 106 00:04:04,080 --> 00:04:05,519 a symbolic representation of the 107 00:04:05,520 --> 00:04:06,599 programing, etc.. 108 00:04:06,600 --> 00:04:08,129 And then you get the DA Procyon 109 00:04:08,130 --> 00:04:10,589 conference that was 110 00:04:10,590 --> 00:04:12,689 held one year and a 111 00:04:12,690 --> 00:04:13,690 half ago. 112 00:04:14,580 --> 00:04:16,768 The last thing you can do is 113 00:04:16,769 --> 00:04:18,898 simply execution Trista's, which 114 00:04:18,899 --> 00:04:20,458 means recording everything that happens 115 00:04:20,459 --> 00:04:22,169 in a program. 116 00:04:22,170 --> 00:04:24,419 Once you have that, you can replay 117 00:04:24,420 --> 00:04:26,099 those execution traces and you get 118 00:04:26,100 --> 00:04:27,659 something that looks like a timeless 119 00:04:27,660 --> 00:04:29,459 debugger where you can go forward and 120 00:04:29,460 --> 00:04:31,979 backward and you can try to 121 00:04:31,980 --> 00:04:33,569 track everything that's happening in the 122 00:04:33,570 --> 00:04:35,550 program without relaunching it. 123 00:04:37,020 --> 00:04:39,419 Also, something that I've been working 124 00:04:39,420 --> 00:04:41,549 on before was a software side channel 125 00:04:41,550 --> 00:04:43,199 attack against cryptography. 126 00:04:43,200 --> 00:04:45,119 So you're using the execution tricks to 127 00:04:45,120 --> 00:04:47,009 generate Citronelle information and 128 00:04:47,010 --> 00:04:49,139 trying to recover a key from 129 00:04:49,140 --> 00:04:51,509 obfuscated cryptography. 130 00:04:53,040 --> 00:04:55,199 So so far, one of the questions 131 00:04:55,200 --> 00:04:57,329 for you might be asking 132 00:04:57,330 --> 00:04:59,429 is why not just the burgers? 133 00:04:59,430 --> 00:05:01,799 Because the burgers can do all of that. 134 00:05:01,800 --> 00:05:03,239 The problem is that these burgers, 135 00:05:03,240 --> 00:05:05,459 although there are some they are meant 136 00:05:05,460 --> 00:05:08,159 for human, not for machines, 137 00:05:08,160 --> 00:05:10,049 and they are very slow. 138 00:05:10,050 --> 00:05:11,430 So if you imagine 139 00:05:13,560 --> 00:05:16,469 a burger that's attached to a target 140 00:05:16,470 --> 00:05:18,809 and you imagine the target exposed 141 00:05:18,810 --> 00:05:20,879 and you press continue, this 142 00:05:20,880 --> 00:05:22,949 is what happens on the on 143 00:05:22,950 --> 00:05:24,269 your system. 144 00:05:24,270 --> 00:05:26,549 So the burgers and a resume 145 00:05:26,550 --> 00:05:28,709 call to the colonel, the 146 00:05:28,710 --> 00:05:30,539 colonel decides to reschedule the 147 00:05:30,540 --> 00:05:31,919 execution of the target. 148 00:05:31,920 --> 00:05:33,239 So you have the whole scheduling of the 149 00:05:33,240 --> 00:05:35,039 process that happens. 150 00:05:35,040 --> 00:05:37,199 And then the target hit a break point, 151 00:05:37,200 --> 00:05:39,389 which is like a trap to interrupt, 152 00:05:39,390 --> 00:05:41,429 which jumps back into the kernel. 153 00:05:41,430 --> 00:05:42,479 And then the colonel 154 00:05:44,070 --> 00:05:45,209 decide to that. 155 00:05:45,210 --> 00:05:46,620 It seems that the interrupters 156 00:05:47,790 --> 00:05:49,229 should be captured by the debugger, 157 00:05:49,230 --> 00:05:51,599 basically. So it's going to send a signal 158 00:05:51,600 --> 00:05:53,129 to the debugger and reschedule the 159 00:05:53,130 --> 00:05:54,659 execution of the debugger. 160 00:05:54,660 --> 00:05:56,939 And while you're doing that for 161 00:05:56,940 --> 00:05:59,279 one continuous breakpoint hit 162 00:05:59,280 --> 00:06:01,709 you're doing for a boundary crossing 163 00:06:01,710 --> 00:06:03,839 between Ukraine and Colonel End, and 164 00:06:03,840 --> 00:06:05,219 also you're trying to process 165 00:06:05,220 --> 00:06:06,419 rescheduling. 166 00:06:06,420 --> 00:06:08,969 So it's quite slow. 167 00:06:08,970 --> 00:06:09,970 How slow 168 00:06:11,580 --> 00:06:13,769 maybe you've 169 00:06:13,770 --> 00:06:15,839 seen those kind of attacks you can do 170 00:06:15,840 --> 00:06:18,119 in cities where you have to reverse 171 00:06:18,120 --> 00:06:19,589 engineer and binary that checks the 172 00:06:19,590 --> 00:06:21,809 password. And what you do is you 173 00:06:21,810 --> 00:06:23,909 try to measure the execution time, 174 00:06:23,910 --> 00:06:26,219 what you're trying different password and 175 00:06:26,220 --> 00:06:27,659 not really the execution time. 176 00:06:27,660 --> 00:06:29,729 Something more interesting, which is the 177 00:06:29,730 --> 00:06:31,349 instruction count, the number of 178 00:06:31,350 --> 00:06:32,399 instructions executed, 179 00:06:33,630 --> 00:06:35,009 and you can try to brute force the 180 00:06:35,010 --> 00:06:37,319 password character by character this way. 181 00:06:37,320 --> 00:06:39,299 And what you see on the top is GDP. 182 00:06:39,300 --> 00:06:40,529 And what you see on the bottom is 183 00:06:40,530 --> 00:06:41,519 interacting. 184 00:06:41,520 --> 00:06:43,469 So on the bottom, you see that it 185 00:06:43,470 --> 00:06:45,609 actually can take a lot of cases 186 00:06:45,610 --> 00:06:47,099 and that you see the number of 187 00:06:47,100 --> 00:06:48,449 instructions executed. 188 00:06:48,450 --> 00:06:50,519 And while GDP is extremely 189 00:06:50,520 --> 00:06:53,189 slow and even though the instruction 190 00:06:53,190 --> 00:06:54,190 is much lower. 191 00:06:55,500 --> 00:06:57,569 So, yeah, you don't want to 192 00:06:57,570 --> 00:06:59,039 use GDP, you don't want to use a 193 00:06:59,040 --> 00:07:00,599 debugger, you want to use a dynamic 194 00:07:00,600 --> 00:07:01,820 instrumentation framework. 195 00:07:03,630 --> 00:07:05,279 So the solution? 196 00:07:05,280 --> 00:07:07,500 Well, it's to get rid of the kernel 197 00:07:08,610 --> 00:07:10,769 and how. Well, the only solution, except 198 00:07:10,770 --> 00:07:13,049 if you want to really have a bare metal 199 00:07:13,050 --> 00:07:15,749 system, is to run the instrumentation 200 00:07:15,750 --> 00:07:17,399 inside the target. 201 00:07:20,680 --> 00:07:22,749 So instrumentation 202 00:07:22,750 --> 00:07:24,249 techniques, so there is a few different 203 00:07:24,250 --> 00:07:26,469 way to do that, the first one 204 00:07:26,470 --> 00:07:28,579 is from source code and I'm pretty sure 205 00:07:28,580 --> 00:07:30,669 everybody has done that is just adding 206 00:07:30,670 --> 00:07:32,699 print statement to your code and get 207 00:07:32,700 --> 00:07:34,119 information. It's basically 208 00:07:34,120 --> 00:07:36,249 instrumentation, but you 209 00:07:36,250 --> 00:07:38,409 could be more smarter and 210 00:07:38,410 --> 00:07:40,599 try to do that automatically at 211 00:07:40,600 --> 00:07:42,459 compile time with the compiler plugin. 212 00:07:44,010 --> 00:07:46,089 The other solution is when you work from 213 00:07:46,090 --> 00:07:48,429 binary and when you work from binary, 214 00:07:48,430 --> 00:07:49,689 it is basically to approach. 215 00:07:49,690 --> 00:07:51,429 The first one is to take your binary and 216 00:07:51,430 --> 00:07:53,499 statically patch it and add hooks 217 00:07:53,500 --> 00:07:55,629 to additional code you insert into the 218 00:07:55,630 --> 00:07:56,679 binary. 219 00:07:56,680 --> 00:07:58,389 And the last one is dynamic binary 220 00:07:58,390 --> 00:07:59,949 instrumentation. 221 00:07:59,950 --> 00:08:01,599 So from source code is really boring. 222 00:08:01,600 --> 00:08:03,129 There's nothing very complex and 223 00:08:03,130 --> 00:08:04,630 interesting to know about that, 224 00:08:05,680 --> 00:08:07,719 from static binary patching and hooking 225 00:08:07,720 --> 00:08:10,179 while it's crude in BOBRICK 226 00:08:10,180 --> 00:08:11,949 because you have to know in advance where 227 00:08:11,950 --> 00:08:13,659 you want to insert your hook, which is 228 00:08:13,660 --> 00:08:15,789 not really possible. 229 00:08:15,790 --> 00:08:17,829 And so this talk is about dynamic binary 230 00:08:17,830 --> 00:08:18,830 instrumentation. 231 00:08:20,650 --> 00:08:22,989 So there's a few existing frameworks. 232 00:08:22,990 --> 00:08:25,059 And the first one is vulgarized, 233 00:08:25,060 --> 00:08:27,279 as I mentioned, that exist since 2000 234 00:08:27,280 --> 00:08:29,529 is open source, however, only 235 00:08:29,530 --> 00:08:31,509 supports Unix platforms and is very 236 00:08:31,510 --> 00:08:33,788 complex to use 237 00:08:33,789 --> 00:08:35,408 not to using the command line tool, but 238 00:08:35,409 --> 00:08:37,959 to write your instrumentation to. 239 00:08:37,960 --> 00:08:39,759 Another one is Diana Muriel, which 240 00:08:39,760 --> 00:08:42,038 existed in 2002, which is open 241 00:08:42,039 --> 00:08:44,379 source, its cross platform, 242 00:08:44,380 --> 00:08:46,119 cross architecture. 243 00:08:46,120 --> 00:08:47,889 However, it's very hard to use too, 244 00:08:47,890 --> 00:08:49,969 because you basically have to manipulate 245 00:08:49,970 --> 00:08:52,419 the instructions in assembly yourself, 246 00:08:52,420 --> 00:08:55,479 which is quite often. 247 00:08:55,480 --> 00:08:57,639 And the last one is interloping, 248 00:08:57,640 --> 00:08:59,349 which is the most popular one because 249 00:08:59,350 --> 00:09:01,029 it's very user friendly. 250 00:09:01,030 --> 00:09:02,559 However, it only support internal 251 00:09:02,560 --> 00:09:04,539 platforms and it's closed source. 252 00:09:06,230 --> 00:09:08,409 So why we made our own 253 00:09:08,410 --> 00:09:09,609 in 2015. 254 00:09:09,610 --> 00:09:11,139 What we wanted was to have a cross 255 00:09:11,140 --> 00:09:12,819 platform and cross architecture DBI 256 00:09:12,820 --> 00:09:15,219 framework, and all of those frameworks 257 00:09:15,220 --> 00:09:17,379 were developed quite a 258 00:09:17,380 --> 00:09:19,449 long time ago. So there was new things 259 00:09:19,450 --> 00:09:21,249 you could try to do that. 260 00:09:21,250 --> 00:09:23,169 And we wanted to focus on mobile and 261 00:09:23,170 --> 00:09:25,179 embedded targets because that's basically 262 00:09:25,180 --> 00:09:27,249 where what we're working on in 263 00:09:27,250 --> 00:09:29,499 our daily job also wanted 264 00:09:29,500 --> 00:09:31,629 to try to have something that was simpler 265 00:09:31,630 --> 00:09:34,149 and that was modular, meaning we could 266 00:09:34,150 --> 00:09:36,369 use it with other reverse 267 00:09:36,370 --> 00:09:38,259 engineering tools easily. 268 00:09:38,260 --> 00:09:40,449 And also because of our past 269 00:09:40,450 --> 00:09:42,759 in cracking DRM, basically, 270 00:09:42,760 --> 00:09:44,049 we wanted to focus on heavy 271 00:09:44,050 --> 00:09:45,939 instrumentation, which means 272 00:09:45,940 --> 00:09:47,559 instrumenting a lot of things, generating 273 00:09:47,560 --> 00:09:48,560 lots of data. 274 00:09:49,870 --> 00:09:52,439 So I'm going now to 275 00:09:52,440 --> 00:09:54,759 to understand the following of the talk. 276 00:09:54,760 --> 00:09:56,709 You need to understand how DBI exactly 277 00:09:56,710 --> 00:09:58,809 work. So I'm going to try to 278 00:09:58,810 --> 00:10:01,140 explain that if I get it correctly. 279 00:10:03,130 --> 00:10:05,589 So the 280 00:10:05,590 --> 00:10:07,839 simple idea is that dynamic 281 00:10:07,840 --> 00:10:09,639 instrumentation is about dynamically 282 00:10:09,640 --> 00:10:12,129 inserting the instrumentation. 283 00:10:12,130 --> 00:10:14,409 So your code you wrote 284 00:10:14,410 --> 00:10:17,589 inside the binary during runtime, 285 00:10:17,590 --> 00:10:19,749 so it looks like this, you 286 00:10:19,750 --> 00:10:21,699 would disassemble the code that's going 287 00:10:21,700 --> 00:10:24,099 to be executed, then 288 00:10:24,100 --> 00:10:26,289 you pass the instructions to 289 00:10:26,290 --> 00:10:28,509 your instrumentation tool that 290 00:10:28,510 --> 00:10:31,059 analyze them and generate the appropriate 291 00:10:31,060 --> 00:10:32,229 instrumentation. 292 00:10:32,230 --> 00:10:34,779 And then the instrumentation get inserted 293 00:10:34,780 --> 00:10:37,069 into the execution flow and 294 00:10:37,070 --> 00:10:38,889 it's executed. And then you have backbend 295 00:10:38,890 --> 00:10:39,890 for scale. 296 00:10:41,660 --> 00:10:43,779 However, I mean, this this 297 00:10:43,780 --> 00:10:45,939 looks simple, but there's 298 00:10:45,940 --> 00:10:48,939 a few problems with this, this graphic. 299 00:10:48,940 --> 00:10:49,989 Not not back. 300 00:10:49,990 --> 00:10:52,059 Other problem, the first 301 00:10:52,060 --> 00:10:53,949 one is disassembling. 302 00:10:53,950 --> 00:10:56,079 You cannot just you don't know 303 00:10:56,080 --> 00:10:57,789 when you get the binary what part of the 304 00:10:57,790 --> 00:10:59,979 binary code? Because this data data 305 00:10:59,980 --> 00:11:01,759 is good and there is a lot of 306 00:11:01,760 --> 00:11:03,579 unpredictable branching and jump 307 00:11:03,580 --> 00:11:04,869 everywhere. 308 00:11:04,870 --> 00:11:06,609 So you cannot just disassemble the whole 309 00:11:06,610 --> 00:11:07,780 binary in advance. 310 00:11:09,130 --> 00:11:10,929 So you need to discover what is going to 311 00:11:10,930 --> 00:11:13,059 be the code of the binary 312 00:11:13,060 --> 00:11:14,709 as you go during the execution. 313 00:11:16,240 --> 00:11:18,699 Basically what this means is that 314 00:11:18,700 --> 00:11:20,649 you will execute a short piece of code, a 315 00:11:20,650 --> 00:11:22,779 block of code, something that 316 00:11:22,780 --> 00:11:24,399 is a series of instructions you can 317 00:11:24,400 --> 00:11:26,079 predict will be executed. 318 00:11:26,080 --> 00:11:28,149 And then that ends with 319 00:11:28,150 --> 00:11:29,809 a controlled flow statement like 320 00:11:29,810 --> 00:11:31,929 conditional dump or call 321 00:11:31,930 --> 00:11:32,979 that you cannot resolve. 322 00:11:34,030 --> 00:11:36,099 And you execute just this piece of code 323 00:11:36,100 --> 00:11:38,259 and then you look where it wants to 324 00:11:38,260 --> 00:11:40,069 flow next. What's the next instruction 325 00:11:40,070 --> 00:11:41,259 you want to execute? 326 00:11:41,260 --> 00:11:43,539 And this way you discover the next block 327 00:11:43,540 --> 00:11:45,699 of code you want to execute and you 328 00:11:45,700 --> 00:11:47,739 simply execute the next block of code. 329 00:11:47,740 --> 00:11:49,869 So this basically form a short cycle 330 00:11:49,870 --> 00:11:50,870 of execution. 331 00:11:52,330 --> 00:11:55,329 Not the other problem you have is 332 00:11:55,330 --> 00:11:57,489 about the instrumentation, the code that 333 00:11:57,490 --> 00:11:59,529 you're generating, because the 334 00:11:59,530 --> 00:12:01,899 instrumentation code you generate is 335 00:12:01,900 --> 00:12:04,149 much larger than the original code, 336 00:12:04,150 --> 00:12:06,309 because the sum of the original code plus 337 00:12:06,310 --> 00:12:07,719 the instrumentation you're adding 338 00:12:08,740 --> 00:12:10,839 and compiler are not dumb and 339 00:12:10,840 --> 00:12:13,209 tried to tightly fits all the code 340 00:12:13,210 --> 00:12:15,369 into the binary code, not with space. 341 00:12:15,370 --> 00:12:17,739 So there's not much space left around 342 00:12:17,740 --> 00:12:18,929 in the code. 343 00:12:18,930 --> 00:12:21,419 S. segment of your binary, 344 00:12:21,420 --> 00:12:23,519 so you cannot 345 00:12:23,520 --> 00:12:25,649 instrumented in place, you can just added 346 00:12:25,650 --> 00:12:26,779 in the binary. 347 00:12:26,780 --> 00:12:28,979 There's not enough space so you need 348 00:12:28,980 --> 00:12:31,169 to write it somewhere else. 349 00:12:31,170 --> 00:12:32,999 And this introduces the problem of 350 00:12:33,000 --> 00:12:34,889 relocating the original code of the 351 00:12:34,890 --> 00:12:35,890 binary. 352 00:12:37,230 --> 00:12:39,299 So the problem with relocating is 353 00:12:39,300 --> 00:12:41,609 that the code contains a 354 00:12:41,610 --> 00:12:43,919 relative reference to memory addresses. 355 00:12:43,920 --> 00:12:46,109 So you might have a germ that's 356 00:12:46,110 --> 00:12:47,849 now jumped to the instruction 20 byte 357 00:12:47,850 --> 00:12:49,859 forward, or you might have a memory 358 00:12:49,860 --> 00:12:51,929 access that say, oh, no, I need to access 359 00:12:51,930 --> 00:12:53,919 that piece of data, which is one thousand 360 00:12:53,920 --> 00:12:56,579 thirty seven bytes backwards 361 00:12:56,580 --> 00:12:57,989 from my position. 362 00:12:57,990 --> 00:13:00,029 And if you remove if you move everything 363 00:13:00,030 --> 00:13:01,989 well, this reference becomes invalid. 364 00:13:03,690 --> 00:13:05,609 So this means we need to actually 365 00:13:05,610 --> 00:13:08,099 conclude you write the original code 366 00:13:08,100 --> 00:13:09,720 to fix all those references 367 00:13:10,860 --> 00:13:12,929 and in our engine 368 00:13:12,930 --> 00:13:15,059 and what will present during this 369 00:13:15,060 --> 00:13:16,679 talk? This is what we call batching. 370 00:13:18,270 --> 00:13:20,189 So if you summarize everything you need 371 00:13:20,190 --> 00:13:22,259 to do, this is the cycle of life of a 372 00:13:22,260 --> 00:13:23,669 DBI engine. 373 00:13:23,670 --> 00:13:25,919 It start by disassembling 374 00:13:25,920 --> 00:13:27,989 the first piece of code it wants 375 00:13:27,990 --> 00:13:29,039 to execute. 376 00:13:29,040 --> 00:13:31,169 Then it's it batched to make it 377 00:13:31,170 --> 00:13:32,699 relocatable. 378 00:13:32,700 --> 00:13:34,349 Then it adds instrumentation to the 379 00:13:34,350 --> 00:13:36,389 batched piece of code. 380 00:13:36,390 --> 00:13:38,849 It assemble it somewhere in executable 381 00:13:38,850 --> 00:13:40,919 memory somewhere else, and 382 00:13:40,920 --> 00:13:42,689 then you can execute it. 383 00:13:42,690 --> 00:13:44,759 And once the execution finish, it 384 00:13:44,760 --> 00:13:46,469 looks at what the next piece of code to 385 00:13:46,470 --> 00:13:47,789 execute and start again. 386 00:13:50,700 --> 00:13:52,979 So now I let Cedric explain some 387 00:13:52,980 --> 00:13:55,139 of the low level obstruction, we have to 388 00:13:55,140 --> 00:13:56,159 deal with all of that. 389 00:14:04,840 --> 00:14:05,840 All 390 00:14:07,260 --> 00:14:08,769 right. So for this 391 00:14:11,130 --> 00:14:12,130 before 392 00:14:13,620 --> 00:14:15,929 and so 393 00:14:17,740 --> 00:14:18,740 basic 394 00:14:20,420 --> 00:14:22,620 stuff like 395 00:14:26,720 --> 00:14:29,339 that, that's what we call 396 00:14:29,340 --> 00:14:30,340 so 397 00:14:32,490 --> 00:14:34,649 together, they are 398 00:14:34,650 --> 00:14:37,169 confusing what is called a control. 399 00:14:37,170 --> 00:14:39,449 So really, at the end of a busy 400 00:14:39,450 --> 00:14:42,329 block of time in the door and switch on, 401 00:14:42,330 --> 00:14:44,489 which can be a jam, for example. 402 00:14:44,490 --> 00:14:47,159 Um, let's go to another one. 403 00:14:47,160 --> 00:14:49,869 And with this kind of structure, 404 00:14:49,870 --> 00:14:51,209 the control flow would just go on. 405 00:14:51,210 --> 00:14:53,099 I mean, you start a function and it's 406 00:14:53,100 --> 00:14:55,289 just execute every basic block and 407 00:14:55,290 --> 00:14:56,909 you don't have any way to interact with 408 00:14:56,910 --> 00:14:57,869 it. 409 00:14:57,870 --> 00:15:00,299 So what we want is some kind 410 00:15:00,300 --> 00:15:02,909 of under control 411 00:15:02,910 --> 00:15:03,789 flow. 412 00:15:03,790 --> 00:15:06,299 And by this I mean we want 413 00:15:06,300 --> 00:15:08,429 after the first block to take the 414 00:15:08,430 --> 00:15:09,929 control back. 415 00:15:09,930 --> 00:15:12,029 We want to be able to skip 416 00:15:12,030 --> 00:15:14,309 the jump or of 417 00:15:14,310 --> 00:15:16,559 modify it to jump inside our 418 00:15:16,560 --> 00:15:19,499 own code. They are we will execute 419 00:15:19,500 --> 00:15:21,989 the logic of the engine and 420 00:15:21,990 --> 00:15:24,089 we will be able to go to 421 00:15:24,090 --> 00:15:26,039 the next block and so on. 422 00:15:28,230 --> 00:15:30,899 But if, in fact, everything 423 00:15:30,900 --> 00:15:33,779 is about keeping control of the execution 424 00:15:33,780 --> 00:15:36,119 and this is much 425 00:15:36,120 --> 00:15:38,429 more difficult that we can expect, 426 00:15:39,840 --> 00:15:42,059 because what we want, 427 00:15:42,060 --> 00:15:44,129 it's it's for 428 00:15:44,130 --> 00:15:46,349 us, it requires to modify 429 00:15:46,350 --> 00:15:48,779 the original instruction of the binary. 430 00:15:48,780 --> 00:15:51,239 But we don't want to modify 431 00:15:51,240 --> 00:15:54,269 the original behavior of the software. 432 00:15:54,270 --> 00:15:56,849 We want to be able 433 00:15:56,850 --> 00:15:59,429 to execute it as 434 00:15:59,430 --> 00:16:00,989 it was and touch. 435 00:16:00,990 --> 00:16:02,849 And this is quite difficult. 436 00:16:02,850 --> 00:16:05,129 And we create lots of low level 437 00:16:05,130 --> 00:16:06,130 tools. 438 00:16:06,990 --> 00:16:08,279 So what do we need? 439 00:16:08,280 --> 00:16:10,559 In fact, very, very 440 00:16:10,560 --> 00:16:12,019 simple thing. We need a multi 441 00:16:12,020 --> 00:16:13,169 architecture disassembled 442 00:16:15,060 --> 00:16:17,369 and we also need multi architecture 443 00:16:17,370 --> 00:16:19,599 assembler and if possible, 444 00:16:19,600 --> 00:16:21,839 it should be able to cross platform 445 00:16:21,840 --> 00:16:24,299 players and 446 00:16:24,300 --> 00:16:26,729 we need some kind of intermediate 447 00:16:26,730 --> 00:16:29,039 representation to work on. 448 00:16:29,040 --> 00:16:31,859 And this representation 449 00:16:31,860 --> 00:16:34,199 should be linked a link 450 00:16:34,200 --> 00:16:36,479 between the disassembly and 451 00:16:36,480 --> 00:16:37,480 the assembler. 452 00:16:38,460 --> 00:16:40,949 So this requirements were 453 00:16:40,950 --> 00:16:42,299 quite strong. 454 00:16:42,300 --> 00:16:44,429 And also what we don't want 455 00:16:44,430 --> 00:16:46,529 well, basically we don't have 456 00:16:46,530 --> 00:16:48,539 unlimited resources. 457 00:16:48,540 --> 00:16:50,609 So if possible, we 458 00:16:50,610 --> 00:16:52,679 don't want, in fact, to implement an 459 00:16:52,680 --> 00:16:55,199 architecture, does a smaller and smaller 460 00:16:55,200 --> 00:16:56,200 and also 461 00:16:57,300 --> 00:17:00,119 about the intermediate representation. 462 00:17:00,120 --> 00:17:02,399 We don't really want to, 463 00:17:02,400 --> 00:17:04,618 like, parse every single 464 00:17:04,619 --> 00:17:06,689 manual for every CPU 465 00:17:06,690 --> 00:17:08,789 of every single architecture we want to 466 00:17:08,790 --> 00:17:10,919 support because 467 00:17:10,920 --> 00:17:13,659 yeah, I like developer manual, but 468 00:17:13,660 --> 00:17:15,779 I'm not that firm, so we don't have 469 00:17:15,780 --> 00:17:17,789 10 years to spend on passing them. 470 00:17:19,140 --> 00:17:21,568 But things are changed recently. 471 00:17:21,569 --> 00:17:22,989 Since a few years. 472 00:17:22,990 --> 00:17:25,828 There is a new player and 473 00:17:25,829 --> 00:17:26,999 that player is called 474 00:17:28,079 --> 00:17:29,459 LVM. 475 00:17:29,460 --> 00:17:31,659 And what is LVM? 476 00:17:31,660 --> 00:17:34,409 11 basically was created to 477 00:17:34,410 --> 00:17:36,569 focus on a compiler technology 478 00:17:36,570 --> 00:17:38,669 and especially just in 479 00:17:38,670 --> 00:17:40,079 time engine. 480 00:17:40,080 --> 00:17:42,449 Um, but basically Agim 481 00:17:42,450 --> 00:17:45,449 is the core foundation, 482 00:17:45,450 --> 00:17:47,579 a framework of a very well 483 00:17:47,580 --> 00:17:49,799 known software today, which is the 484 00:17:49,800 --> 00:17:51,209 Klank compiler. 485 00:17:51,210 --> 00:17:53,519 I'm sure like everybody 486 00:17:53,520 --> 00:17:56,339 in this room knows of the compiler 487 00:17:56,340 --> 00:17:58,739 and but it's smaller than 488 00:17:58,740 --> 00:18:01,649 this. It's like a war tool kit 489 00:18:01,650 --> 00:18:03,899 that provides tons of things to play 490 00:18:03,900 --> 00:18:04,900 with binaries. 491 00:18:06,780 --> 00:18:09,329 So, in fact, for us, LVM 492 00:18:09,330 --> 00:18:11,729 already has everything to 493 00:18:11,730 --> 00:18:13,889 support. A lot of architecture, like all 494 00:18:13,890 --> 00:18:16,559 the major one I am 495 00:18:16,560 --> 00:18:19,559 excited, excited for 496 00:18:19,560 --> 00:18:21,629 it provides both of these 497 00:18:21,630 --> 00:18:23,129 are smaller and smaller. 498 00:18:23,130 --> 00:18:25,229 But more than this, it 499 00:18:25,230 --> 00:18:27,359 also provides this 500 00:18:27,360 --> 00:18:29,769 intermediate representation, the 501 00:18:29,770 --> 00:18:32,429 that link, basically the disassembly 502 00:18:32,430 --> 00:18:33,430 and the assembler. 503 00:18:34,590 --> 00:18:37,259 And maybe some of you already 504 00:18:37,260 --> 00:18:39,509 heard about the LVM 505 00:18:39,510 --> 00:18:41,130 intermediate representation. 506 00:18:42,630 --> 00:18:44,729 But if this is not this one, this is 507 00:18:44,730 --> 00:18:46,889 not the intermediate representation 508 00:18:46,890 --> 00:18:49,449 which is used for the compiler pass. 509 00:18:49,450 --> 00:18:51,639 It's another one, a more low level one, 510 00:18:51,640 --> 00:18:54,099 which is called a matching 511 00:18:54,100 --> 00:18:57,339 code and or VMC, 512 00:18:57,340 --> 00:18:59,709 and this low level representation 513 00:18:59,710 --> 00:19:01,819 is this glue between the disassembly 514 00:19:01,820 --> 00:19:02,820 and assembler. 515 00:19:03,820 --> 00:19:05,949 So what is a little VMC? 516 00:19:05,950 --> 00:19:08,139 Let's start with a very basic, simple 517 00:19:08,140 --> 00:19:09,140 instruction here 518 00:19:10,810 --> 00:19:12,189 for the DBI. 519 00:19:12,190 --> 00:19:14,529 It's just while in the process 520 00:19:14,530 --> 00:19:16,659 of exploring the code, an instructions, 521 00:19:16,660 --> 00:19:17,799 just a series of bytes. 522 00:19:17,800 --> 00:19:20,199 So basically what LRM 523 00:19:20,200 --> 00:19:22,569 provide us, we feed it with 524 00:19:22,570 --> 00:19:25,119 this list of bytes and Olivia 525 00:19:25,120 --> 00:19:26,709 just provided this. 526 00:19:26,710 --> 00:19:28,929 And this in fact is an intermediate 527 00:19:28,930 --> 00:19:30,319 representation already. 528 00:19:30,320 --> 00:19:31,779 It's a very simple one. 529 00:19:31,780 --> 00:19:33,279 You have like two chip. 530 00:19:33,280 --> 00:19:35,049 There is like the MSA insert, which is 531 00:19:35,050 --> 00:19:37,179 the instruction and the instruction is 532 00:19:37,180 --> 00:19:39,399 composed of a list of operand. 533 00:19:39,400 --> 00:19:41,649 But basically if if 534 00:19:41,650 --> 00:19:43,749 it's simple, you have like all 535 00:19:43,750 --> 00:19:45,819 the interesting information that you 536 00:19:45,820 --> 00:19:47,439 need and it's already something that we 537 00:19:47,440 --> 00:19:48,440 can work on. 538 00:19:49,630 --> 00:19:51,849 So of inventory, it's, as 539 00:19:51,850 --> 00:19:53,979 you said, very minimalist with only 540 00:19:53,980 --> 00:19:54,980 a few structures 541 00:19:56,800 --> 00:19:59,139 and it's also totally generic. 542 00:19:59,140 --> 00:20:01,309 So for all the architecture of the use 543 00:20:01,310 --> 00:20:03,219 of the same representation, which is 544 00:20:03,220 --> 00:20:05,349 quite good to work on, because 545 00:20:05,350 --> 00:20:07,149 we can do things in a totally generic 546 00:20:07,150 --> 00:20:10,269 way, but still it includes 547 00:20:10,270 --> 00:20:12,519 lots of things about an instruction. 548 00:20:12,520 --> 00:20:15,009 So at least everything 549 00:20:15,010 --> 00:20:17,420 we need to work on the intuition, 550 00:20:18,490 --> 00:20:20,829 but the down downside, it's kind of very 551 00:20:20,830 --> 00:20:23,529 raw. OK, it's simple, but 552 00:20:23,530 --> 00:20:25,839 structures are simple, but using them 553 00:20:25,840 --> 00:20:28,449 to do complex things, it's going to 554 00:20:28,450 --> 00:20:30,519 feel like something bit complex and we 555 00:20:30,520 --> 00:20:31,599 will see why. 556 00:20:31,600 --> 00:20:33,669 And also they wanted it to 557 00:20:33,670 --> 00:20:34,669 be generic. 558 00:20:34,670 --> 00:20:36,729 So for this, we met some 559 00:20:36,730 --> 00:20:37,989 compromises. 560 00:20:37,990 --> 00:20:39,939 And for example, they don't encode 561 00:20:39,940 --> 00:20:41,919 everything but an instruction that a lot 562 00:20:41,920 --> 00:20:43,989 of glue called a bit of 563 00:20:43,990 --> 00:20:45,459 everywhere, in fact. 564 00:20:45,460 --> 00:20:47,679 And it's makes using this layer 565 00:20:47,680 --> 00:20:48,680 a bit tricky. 566 00:20:49,900 --> 00:20:51,789 So, for example, we want to create 567 00:20:51,790 --> 00:20:53,949 instruction. It's one thing that we will 568 00:20:53,950 --> 00:20:55,959 need to do for DBI. 569 00:20:55,960 --> 00:20:58,059 And if every instruction use the same 570 00:20:58,060 --> 00:21:00,279 representation, in fact, they 571 00:21:00,280 --> 00:21:02,379 use it in slightly a bit different 572 00:21:02,380 --> 00:21:04,509 way. For example, here we have a move. 573 00:21:04,510 --> 00:21:06,639 And this move, as we have seen, is 574 00:21:06,640 --> 00:21:08,859 like, OK, a series of up 575 00:21:08,860 --> 00:21:12,039 around grouped in an instruction. 576 00:21:12,040 --> 00:21:14,169 But the thing is, this 577 00:21:14,170 --> 00:21:16,359 representation, it's not documented. 578 00:21:16,360 --> 00:21:18,549 It's not really standard. 579 00:21:18,550 --> 00:21:20,889 Every instruction possibly used 580 00:21:20,890 --> 00:21:22,539 is on encoding. 581 00:21:22,540 --> 00:21:24,819 And tonight you need to look 582 00:21:24,820 --> 00:21:26,349 at the glue code. 583 00:21:26,350 --> 00:21:28,569 So it's it's not it 584 00:21:28,570 --> 00:21:31,029 has not been created to be an 585 00:21:31,030 --> 00:21:32,619 intermediate representation that you can 586 00:21:32,620 --> 00:21:33,639 work on, basically. 587 00:21:34,660 --> 00:21:36,909 And here we also 588 00:21:36,910 --> 00:21:39,009 want to batch instruction, so we need to 589 00:21:39,010 --> 00:21:40,719 modify it. So if you take a look at the 590 00:21:40,720 --> 00:21:42,819 jump, it's very simple, only 591 00:21:42,820 --> 00:21:44,859 like one instruction, one operand, which 592 00:21:44,860 --> 00:21:45,860 is an immediate. 593 00:21:46,690 --> 00:21:48,339 But if you want to modify it, we will 594 00:21:48,340 --> 00:21:50,409 need to go from there to this. 595 00:21:50,410 --> 00:21:52,269 And you can see, OK, it's totally 596 00:21:52,270 --> 00:21:54,039 different. We will be forced to create 597 00:21:54,040 --> 00:21:57,009 some like outcome modification of them 598 00:21:57,010 --> 00:21:58,010 and. 599 00:21:58,990 --> 00:22:01,479 We also have we want to create Bache, 600 00:22:01,480 --> 00:22:03,969 so a series of transformations 601 00:22:03,970 --> 00:22:06,339 and it's very 602 00:22:06,340 --> 00:22:09,039 easy, very simple example of what we 603 00:22:09,040 --> 00:22:11,109 what is in this that we will need 604 00:22:11,110 --> 00:22:13,599 to patch. In fact, if it's an instruction 605 00:22:13,600 --> 00:22:15,639 which is referencing memory, using the 606 00:22:15,640 --> 00:22:17,349 program conter. 607 00:22:17,350 --> 00:22:19,569 So basically, as I said, 608 00:22:19,570 --> 00:22:21,459 at the moment, we will need to move this 609 00:22:21,460 --> 00:22:23,019 instruction elsewhere in the memory. 610 00:22:23,020 --> 00:22:24,789 But if you move the instruction elsewhere 611 00:22:24,790 --> 00:22:26,649 there of the reference will be broken 612 00:22:26,650 --> 00:22:29,019 because it was based on the current 613 00:22:29,020 --> 00:22:30,729 address of this interaction, which is 614 00:22:30,730 --> 00:22:31,730 EER. 615 00:22:32,110 --> 00:22:33,429 So let's have a look. 616 00:22:33,430 --> 00:22:34,430 We are moving it, 617 00:22:35,680 --> 00:22:37,869 but what we will be forced to do 618 00:22:37,870 --> 00:22:39,459 is, OK, we have more the instruction. 619 00:22:39,460 --> 00:22:40,959 It's located on another address in 620 00:22:40,960 --> 00:22:43,449 memory, but we need to replace 621 00:22:43,450 --> 00:22:45,559 the register of the program 622 00:22:45,560 --> 00:22:48,339 counter with another registar 623 00:22:48,340 --> 00:22:50,529 and this register will be loaded with 624 00:22:50,530 --> 00:22:51,909 the original address value. 625 00:22:51,910 --> 00:22:53,949 And this way, when we will execute this 626 00:22:53,950 --> 00:22:55,809 instruction, the reference will be the 627 00:22:55,810 --> 00:22:56,829 same. 628 00:22:56,830 --> 00:22:58,689 But as you can see, we are breaking 629 00:22:58,690 --> 00:23:00,429 things here. We are modifying the 630 00:23:00,430 --> 00:23:02,529 behavior of the program because we are 631 00:23:02,530 --> 00:23:04,599 erasing this register. 632 00:23:04,600 --> 00:23:06,729 So basically what you need is 633 00:23:06,730 --> 00:23:08,470 also to back up the astrologist 634 00:23:10,330 --> 00:23:12,669 and after 635 00:23:12,670 --> 00:23:14,859 like executing the modifying 636 00:23:14,860 --> 00:23:17,019 instruction, you also need to restore 637 00:23:17,020 --> 00:23:19,089 that first picture. So, as you can see, 638 00:23:19,090 --> 00:23:21,819 creation, creation, KRISHAN modification, 639 00:23:21,820 --> 00:23:24,459 lots of operation here on the 640 00:23:24,460 --> 00:23:26,019 intermediate representation, which is 641 00:23:26,020 --> 00:23:27,789 kind of odd, kind of difficult to work 642 00:23:27,790 --> 00:23:28,790 in. 643 00:23:29,050 --> 00:23:31,449 And so 644 00:23:31,450 --> 00:23:33,549 the encoding is a 645 00:23:33,550 --> 00:23:35,829 bit painful to work on. 646 00:23:35,830 --> 00:23:38,049 And you can see here it was 647 00:23:38,050 --> 00:23:40,449 very, very simple touch 648 00:23:40,450 --> 00:23:42,549 and it's already quite complex with a lot 649 00:23:42,550 --> 00:23:45,759 of transformation and a lot of steps. 650 00:23:45,760 --> 00:23:48,009 And we also had a feeling which 651 00:23:48,010 --> 00:23:50,259 is you can see that we have been forced 652 00:23:50,260 --> 00:23:52,389 to back up something to back a projector 653 00:23:52,390 --> 00:23:54,519 so you can feel that 654 00:23:54,520 --> 00:23:56,919 it will be not only the only patch 655 00:23:56,920 --> 00:23:58,989 when you need to back up a register so 656 00:23:58,990 --> 00:24:01,059 you can feel that you will need 657 00:24:01,060 --> 00:24:03,159 some generic steps that 658 00:24:03,160 --> 00:24:06,069 will be needed like a bit everywhere. 659 00:24:06,070 --> 00:24:08,589 And so basically, we need abstractions. 660 00:24:11,140 --> 00:24:13,239 And the idea is to have like 661 00:24:13,240 --> 00:24:15,309 a magical engine, which we can 662 00:24:15,310 --> 00:24:17,469 call the touch engine, it's 663 00:24:17,470 --> 00:24:19,299 its real name, in fact, and 664 00:24:20,470 --> 00:24:22,959 this it will take one instruction 665 00:24:22,960 --> 00:24:25,359 and input, one original instruction, 666 00:24:25,360 --> 00:24:27,699 apply our transformation 667 00:24:27,700 --> 00:24:31,449 on it using abstraction, 668 00:24:31,450 --> 00:24:33,849 and then I'll 669 00:24:33,850 --> 00:24:36,009 at least one or more 670 00:24:36,010 --> 00:24:37,010 instruction. 671 00:24:38,630 --> 00:24:40,789 And for this, 672 00:24:40,790 --> 00:24:43,399 basically, we are vision 673 00:24:43,400 --> 00:24:44,990 like some twisted vision, 674 00:24:47,210 --> 00:24:48,260 we have said, OK, 675 00:24:49,940 --> 00:24:51,229 maybe, maybe 676 00:24:52,370 --> 00:24:54,739 we can identify 677 00:24:54,740 --> 00:24:56,959 the steps that are required 678 00:24:56,960 --> 00:24:59,659 to apply the transformation. 679 00:24:59,660 --> 00:25:02,089 So maybe you can say, OK, 680 00:25:02,090 --> 00:25:04,639 this patch is in fact just a series 681 00:25:04,640 --> 00:25:06,409 of transformation. 682 00:25:06,410 --> 00:25:08,869 Some of them are totally generic. 683 00:25:08,870 --> 00:25:11,269 So maybe we can just regroup them 684 00:25:11,270 --> 00:25:13,339 and try to integrate them in 685 00:25:13,340 --> 00:25:16,209 sort of like a language, 686 00:25:16,210 --> 00:25:18,379 a specific language 687 00:25:18,380 --> 00:25:21,169 which will be specialized in 688 00:25:21,170 --> 00:25:23,089 patching a binary. 689 00:25:23,090 --> 00:25:25,309 And on the part of the idea 690 00:25:25,310 --> 00:25:27,889 was, OK, the instruction, 691 00:25:27,890 --> 00:25:30,409 even if the representation is generic, 692 00:25:30,410 --> 00:25:32,389 the representation, every instruction we 693 00:25:32,390 --> 00:25:35,569 will have will be architecture specific. 694 00:25:35,570 --> 00:25:37,669 So the patch will be architecture 695 00:25:37,670 --> 00:25:40,009 specific, but possibly, maybe 696 00:25:40,010 --> 00:25:42,589 that the language can be generic 697 00:25:42,590 --> 00:25:44,719 and express modification for more 698 00:25:44,720 --> 00:25:45,950 than one architecture. 699 00:25:47,000 --> 00:25:49,069 And so after 700 00:25:49,070 --> 00:25:51,589 some Edek we add, this 701 00:25:53,510 --> 00:25:55,579 schematic will not explain 702 00:25:55,580 --> 00:25:56,569 everything here. 703 00:25:56,570 --> 00:25:58,819 But what we can explain 704 00:25:58,820 --> 00:26:01,009 is that basically the issue that you 705 00:26:01,010 --> 00:26:02,569 have is that you have two worlds. 706 00:26:02,570 --> 00:26:04,759 You have the world of the original 707 00:26:04,760 --> 00:26:06,829 program and you are the world of 708 00:26:06,830 --> 00:26:07,429 the DBI. 709 00:26:07,430 --> 00:26:09,859 So one guest, one asked 710 00:26:09,860 --> 00:26:11,879 and you have interaction between them. 711 00:26:11,880 --> 00:26:13,939 So the idea of the abstraction was 712 00:26:13,940 --> 00:26:16,189 trying to make things a bit 713 00:26:16,190 --> 00:26:18,379 more organized to, in 714 00:26:18,380 --> 00:26:20,509 fact, be able to see the 715 00:26:20,510 --> 00:26:22,699 relation, the precise relation between 716 00:26:22,700 --> 00:26:24,049 them and 717 00:26:25,100 --> 00:26:27,289 by identifying this 718 00:26:27,290 --> 00:26:29,420 relation, also trying to 719 00:26:31,130 --> 00:26:33,349 create abstractions that allow us 720 00:26:33,350 --> 00:26:35,539 to work more easily 721 00:26:35,540 --> 00:26:37,969 with the binary and to express 722 00:26:37,970 --> 00:26:40,219 the complex theme in a simple way. 723 00:26:40,220 --> 00:26:43,189 I will show you what it really means 724 00:26:43,190 --> 00:26:45,079 with you by using this. 725 00:26:45,080 --> 00:26:47,149 This is something 726 00:26:47,150 --> 00:26:49,339 which is part of the language and it's 727 00:26:49,340 --> 00:26:52,039 something which is a temporary register. 728 00:26:52,040 --> 00:26:54,389 So against you have seen a temporary 729 00:26:54,390 --> 00:26:56,719 register before in this 730 00:26:56,720 --> 00:26:58,639 example. So this is the same example than 731 00:26:58,640 --> 00:27:00,589 before. And this was our temporary 732 00:27:00,590 --> 00:27:02,719 register with our language. 733 00:27:02,720 --> 00:27:04,609 If you want to create something like 734 00:27:04,610 --> 00:27:07,069 this, says a series of instruction, 735 00:27:07,070 --> 00:27:09,229 in fact, you can just say, OK, I want 736 00:27:09,230 --> 00:27:11,629 a temporary register, so I want a 737 00:27:11,630 --> 00:27:12,630 stamp. 738 00:27:13,460 --> 00:27:15,549 And the register will be 739 00:27:15,550 --> 00:27:17,749 just as a free register 740 00:27:17,750 --> 00:27:19,819 automatically by the engine and it will 741 00:27:19,820 --> 00:27:22,129 be identified by an idea so you can 742 00:27:22,130 --> 00:27:24,379 work on with letter 743 00:27:24,380 --> 00:27:25,380 and. 744 00:27:26,220 --> 00:27:28,919 By working on it, I mean, OK, 745 00:27:28,920 --> 00:27:31,109 this back up where needed for 746 00:27:31,110 --> 00:27:33,659 modifying this original instruction 747 00:27:33,660 --> 00:27:35,879 and to do this modification, 748 00:27:35,880 --> 00:27:37,859 in fact, what we are doing here, it's 749 00:27:37,860 --> 00:27:40,079 basically replacing 750 00:27:40,080 --> 00:27:42,519 the register in the original instruction. 751 00:27:42,520 --> 00:27:45,089 So we are doing a substitution 752 00:27:45,090 --> 00:27:47,159 with a temporary register 753 00:27:47,160 --> 00:27:49,259 on the original register, which is 754 00:27:49,260 --> 00:27:51,479 PC, and by using 755 00:27:51,480 --> 00:27:53,549 a reference to a temporary one. 756 00:27:53,550 --> 00:27:56,549 So you can see it's kind of a language 757 00:27:56,550 --> 00:27:58,719 with like keywords and 758 00:27:58,720 --> 00:28:00,809 kind of variables and things like this. 759 00:28:03,510 --> 00:28:05,279 And another idea was 760 00:28:06,360 --> 00:28:08,489 basically putting a binary is 761 00:28:08,490 --> 00:28:11,219 just applying a series of rules, 762 00:28:11,220 --> 00:28:13,589 because if you want to modify 763 00:28:13,590 --> 00:28:15,659 something is because there is a need 764 00:28:15,660 --> 00:28:17,729 to do it, there is, 765 00:28:17,730 --> 00:28:19,889 in fact, a condition, one or more 766 00:28:19,890 --> 00:28:22,259 condition that foreign import 767 00:28:22,260 --> 00:28:24,509 restriction will apply 768 00:28:24,510 --> 00:28:26,909 a series of actions and actions. 769 00:28:26,910 --> 00:28:29,279 Basically, there are our transformations 770 00:28:29,280 --> 00:28:31,379 that we want to apply on. 771 00:28:31,380 --> 00:28:33,449 So let's take 772 00:28:33,450 --> 00:28:35,579 a look at let's have a look at us, what 773 00:28:35,580 --> 00:28:37,209 we call a rules, in fact. 774 00:28:37,210 --> 00:28:39,299 So this is just extracted from 775 00:28:39,300 --> 00:28:41,189 the original code of the debate. 776 00:28:41,190 --> 00:28:43,289 And this is our patch. 777 00:28:43,290 --> 00:28:45,419 This is the things that 778 00:28:45,420 --> 00:28:47,849 replace the program 779 00:28:47,850 --> 00:28:50,069 conter with a temporary 780 00:28:50,070 --> 00:28:52,169 register. And you'll see there is like 781 00:28:52,170 --> 00:28:53,519 one condition here. 782 00:28:53,520 --> 00:28:56,339 So this is only one rule, one condition, 783 00:28:56,340 --> 00:28:58,529 which is you must use a register 784 00:28:58,530 --> 00:29:00,629 PC and then the actual you can 785 00:29:00,630 --> 00:29:02,969 see if it's all very some of them. 786 00:29:02,970 --> 00:29:05,309 But I will not enter into the detail. 787 00:29:05,310 --> 00:29:07,039 But you can find any other substitute 788 00:29:07,040 --> 00:29:09,239 stamp and the temporary register 789 00:29:09,240 --> 00:29:10,240 and so on. 790 00:29:11,610 --> 00:29:12,610 And 791 00:29:14,010 --> 00:29:16,529 a nice thing with this rule, it's 792 00:29:16,530 --> 00:29:17,579 generic. 793 00:29:17,580 --> 00:29:19,469 I mean, it's exactly the same rule that 794 00:29:19,470 --> 00:29:22,139 we are using on a ram x 795 00:29:22,140 --> 00:29:23,520 86 64. 796 00:29:24,810 --> 00:29:27,539 And another example here, it's pure 797 00:29:27,540 --> 00:29:29,729 ram. And here we 798 00:29:29,730 --> 00:29:31,799 basically need to replace the jump, as 799 00:29:31,800 --> 00:29:33,899 you have seen. So that's exactly what 800 00:29:33,900 --> 00:29:35,489 you are doing. The condition is a bit 801 00:29:35,490 --> 00:29:37,619 more complex because we have a 802 00:29:37,620 --> 00:29:39,779 condition which apply on 803 00:29:39,780 --> 00:29:41,099 several inspection. 804 00:29:41,100 --> 00:29:42,599 But the idea is always the same. 805 00:29:42,600 --> 00:29:44,819 You have like keywords for your language, 806 00:29:44,820 --> 00:29:46,289 valuables and things like this. 807 00:29:48,310 --> 00:29:50,799 So what do we learn first 808 00:29:50,800 --> 00:29:52,869 album is really a magic 809 00:29:52,870 --> 00:29:54,039 piece of software. 810 00:29:54,040 --> 00:29:55,239 It's very robust. 811 00:29:55,240 --> 00:29:57,319 It provides staffing and basically 812 00:29:57,320 --> 00:29:58,989 just save us on this. 813 00:29:58,990 --> 00:30:02,619 But the problem we had was 814 00:30:02,620 --> 00:30:05,109 the intermediate representation was 815 00:30:05,110 --> 00:30:07,089 so simple, in fact, that it became very 816 00:30:07,090 --> 00:30:08,409 complex to play with. 817 00:30:08,410 --> 00:30:10,719 And you can do it by and you can really 818 00:30:10,720 --> 00:30:12,999 create patch by like with a giant 819 00:30:13,000 --> 00:30:14,289 switch case. 820 00:30:14,290 --> 00:30:15,549 It's it doesn't work. 821 00:30:15,550 --> 00:30:17,679 It's just breaking your head. 822 00:30:17,680 --> 00:30:19,299 So you really need to focus on 823 00:30:19,300 --> 00:30:21,729 obstruction. And we were quite surprised 824 00:30:21,730 --> 00:30:23,859 or how difficult it was to 825 00:30:23,860 --> 00:30:24,939 create this obstruction. 826 00:30:24,940 --> 00:30:27,069 And honestly, it's still a work 827 00:30:27,070 --> 00:30:28,070 in progress. 828 00:30:29,800 --> 00:30:31,929 But, yeah, it allows us to 829 00:30:31,930 --> 00:30:34,359 make quite complex 830 00:30:34,360 --> 00:30:36,669 transformation in something 831 00:30:36,670 --> 00:30:38,859 that is very easy to read you just off 832 00:30:38,860 --> 00:30:40,399 your list of operation. 833 00:30:40,400 --> 00:30:42,459 It's very easy to understand what the DBA 834 00:30:42,460 --> 00:30:43,630 is doing on your binary. 835 00:30:45,220 --> 00:30:46,669 So I will. 836 00:30:46,670 --> 00:30:48,920 Yet the remote tushar for the next part 837 00:30:50,340 --> 00:30:52,549 of the next part I want to talk about 838 00:30:52,550 --> 00:30:54,889 is how you need to think about 839 00:30:54,890 --> 00:30:56,899 cross architectural support. 840 00:30:56,900 --> 00:30:59,029 So if some 841 00:30:59,030 --> 00:31:01,279 of the DBI formwork like 842 00:31:01,280 --> 00:31:03,619 interloping, has that trouble supporting 843 00:31:03,620 --> 00:31:05,689 instruction set like they actually tried 844 00:31:05,690 --> 00:31:07,609 and then they decided they would not 845 00:31:07,610 --> 00:31:08,839 support it anymore. 846 00:31:08,840 --> 00:31:10,819 And one of the reason is that if you 847 00:31:10,820 --> 00:31:12,829 don't think about cross architectural 848 00:31:12,830 --> 00:31:15,079 support from the start, it can 849 00:31:15,080 --> 00:31:16,909 becomes a very complex. 850 00:31:16,910 --> 00:31:19,279 And I show you how we handle this issue 851 00:31:19,280 --> 00:31:21,439 in our DBI engine. 852 00:31:21,440 --> 00:31:23,749 So if you think about what's 853 00:31:23,750 --> 00:31:26,149 going on in your in the process, 854 00:31:26,150 --> 00:31:28,819 you can divide the space into 855 00:31:28,820 --> 00:31:30,289 two entities. 856 00:31:30,290 --> 00:31:32,599 The first one is the horse. 857 00:31:32,600 --> 00:31:34,939 So the horse contained the DBI engine 858 00:31:34,940 --> 00:31:37,519 and the instrumentation tool 859 00:31:37,520 --> 00:31:39,049 you've written. 860 00:31:39,050 --> 00:31:41,209 And the second part is the guests and 861 00:31:41,210 --> 00:31:43,339 the guests contained the original binary 862 00:31:43,340 --> 00:31:45,829 and the instrumented code generated 863 00:31:45,830 --> 00:31:47,539 by the DBI engine. 864 00:31:47,540 --> 00:31:49,279 So you see that this houseguest 865 00:31:49,280 --> 00:31:51,559 terminology is taken from the VM world 866 00:31:51,560 --> 00:31:53,029 because it kind of makes sense. 867 00:31:54,380 --> 00:31:56,839 Now, this is not a VM. 868 00:31:56,840 --> 00:31:59,179 So the problem is these 869 00:31:59,180 --> 00:32:01,219 two context. They share the same memory 870 00:32:01,220 --> 00:32:03,079 in the same CPU context because they are 871 00:32:03,080 --> 00:32:04,099 just one process. 872 00:32:05,540 --> 00:32:07,729 And this means 873 00:32:07,730 --> 00:32:09,949 that we we're going to need to switch 874 00:32:09,950 --> 00:32:12,109 between the two context of the host 875 00:32:12,110 --> 00:32:14,119 and the guest at every cycle during the 876 00:32:14,120 --> 00:32:15,559 execution. 877 00:32:15,560 --> 00:32:17,689 However, we do not get any help 878 00:32:17,690 --> 00:32:19,429 from the kernel or the CPU because this 879 00:32:19,430 --> 00:32:21,109 is not a VM. We are not going to use a 880 00:32:21,110 --> 00:32:22,699 visualization extension and things like 881 00:32:22,700 --> 00:32:23,700 that. 882 00:32:25,790 --> 00:32:28,129 So this basically means 883 00:32:28,130 --> 00:32:30,319 saving and restoring the CPU context 884 00:32:30,320 --> 00:32:32,599 from the get in the horse and vice versa. 885 00:32:32,600 --> 00:32:34,539 At every time you switch between the two. 886 00:32:36,110 --> 00:32:38,269 However, you need to avoid any 887 00:32:38,270 --> 00:32:40,429 side effect on the guest because the host 888 00:32:40,430 --> 00:32:42,599 is aware that it's a DBI and 889 00:32:42,600 --> 00:32:43,759 does exhibiting. 890 00:32:43,760 --> 00:32:45,559 But the guest is not aware and should not 891 00:32:45,560 --> 00:32:47,689 be aware that you're doing context, 892 00:32:47,690 --> 00:32:49,399 which is between basically another 893 00:32:49,400 --> 00:32:51,410 process inside this process. 894 00:32:52,700 --> 00:32:54,589 So this means that you cannot modify it 895 00:32:54,590 --> 00:32:56,989 stuck and you cannot erase 896 00:32:56,990 --> 00:32:59,029 any of it register or else the program 897 00:32:59,030 --> 00:33:00,030 will just crash. 898 00:33:01,610 --> 00:33:03,829 And the only way to 899 00:33:03,830 --> 00:33:05,989 have something that worked this way 900 00:33:05,990 --> 00:33:08,059 is that the guests need 901 00:33:08,060 --> 00:33:10,429 to be able to relatively 902 00:33:10,430 --> 00:33:13,579 access memory from the host 903 00:33:13,580 --> 00:33:15,649 because you cannot just 904 00:33:15,650 --> 00:33:17,659 compute the memory addressing a register 905 00:33:17,660 --> 00:33:19,879 because you cannot erase a register and 906 00:33:19,880 --> 00:33:21,039 you cannot save that for. 907 00:33:21,040 --> 00:33:22,699 So you want to understand because you 908 00:33:22,700 --> 00:33:23,719 cannot modify the text. 909 00:33:23,720 --> 00:33:25,609 So the only ways that you need to be able 910 00:33:25,610 --> 00:33:27,689 to do direct reference to 911 00:33:27,690 --> 00:33:30,259 Roulet to a relative memory address 912 00:33:30,260 --> 00:33:31,260 in the guest, 913 00:33:33,800 --> 00:33:35,959 however, relate to addressing 914 00:33:35,960 --> 00:33:38,059 is extremely constrained by Sepo 915 00:33:38,060 --> 00:33:39,289 architecture. 916 00:33:39,290 --> 00:33:41,389 So under 86 you can 917 00:33:41,390 --> 00:33:43,639 do 32 bit relative memory 918 00:33:43,640 --> 00:33:45,889 address, but under armor you only 919 00:33:45,890 --> 00:33:47,389 have 12 bits. 920 00:33:47,390 --> 00:33:49,639 And basically, if you 921 00:33:49,640 --> 00:33:51,169 look at the encoding, this mean that 922 00:33:51,170 --> 00:33:53,989 you're limited by a plus forty 923 00:33:53,990 --> 00:33:56,239 four thousand ninety six bits for 924 00:33:56,240 --> 00:33:58,690 the minus underarm. 925 00:34:00,290 --> 00:34:02,389 So the conclusion of this is that 926 00:34:02,390 --> 00:34:04,159 if you want to have a context which that 927 00:34:04,160 --> 00:34:05,160 works nicely 928 00:34:06,530 --> 00:34:08,749 on cross architecture support, you need 929 00:34:08,750 --> 00:34:11,178 to have a situation 930 00:34:11,179 --> 00:34:13,488 where you have memory 931 00:34:13,489 --> 00:34:15,589 really close to the guest goes to the 932 00:34:15,590 --> 00:34:17,219 instrument that could you generated. 933 00:34:19,820 --> 00:34:21,408 The other problem is that if you want to 934 00:34:21,409 --> 00:34:23,449 play nice with data execution prevention 935 00:34:25,040 --> 00:34:27,109 so we could put data next to the 936 00:34:27,110 --> 00:34:29,388 guest, it's simple because 937 00:34:29,389 --> 00:34:30,888 we are generating that code. 938 00:34:30,889 --> 00:34:31,889 However, 939 00:34:33,020 --> 00:34:35,149 we don't want you cannot have a memory 940 00:34:35,150 --> 00:34:36,738 page that would be read, write, execute, 941 00:34:36,739 --> 00:34:38,899 because on some operating system this is 942 00:34:38,900 --> 00:34:40,879 not allowed. So what we're doing 943 00:34:40,880 --> 00:34:43,009 basically is allocating to contiguous 944 00:34:43,010 --> 00:34:44,988 memory. Page one of them is really 945 00:34:44,989 --> 00:34:47,119 execute. The other right is read right. 946 00:34:47,120 --> 00:34:49,638 And this way we can have 947 00:34:49,639 --> 00:34:51,468 satisfied all those conditions. 948 00:34:51,469 --> 00:34:52,549 So it looks like this. 949 00:34:52,550 --> 00:34:54,769 So this is the first page we call 950 00:34:54,770 --> 00:34:57,138 the code block and this is the second 951 00:34:57,139 --> 00:34:58,609 page we call the data block. 952 00:34:58,610 --> 00:34:59,729 This one is really execute. 953 00:34:59,730 --> 00:35:00,730 This one is read. Right. 954 00:35:01,910 --> 00:35:04,039 So the first piece of 955 00:35:04,040 --> 00:35:05,839 code in the code block is the Prolog, 956 00:35:05,840 --> 00:35:07,879 which will is basically in charge of the 957 00:35:07,880 --> 00:35:10,039 context which and to do that context, 958 00:35:10,040 --> 00:35:11,479 which gets simply 959 00:35:12,650 --> 00:35:14,869 stored the host context dir and load 960 00:35:14,870 --> 00:35:17,269 the context they're using relative 961 00:35:17,270 --> 00:35:18,409 addressing. 962 00:35:18,410 --> 00:35:20,029 This is the instrumented code, the one we 963 00:35:20,030 --> 00:35:21,829 generate with the engine, and this is the 964 00:35:21,830 --> 00:35:23,989 epilog which does the inverse 965 00:35:23,990 --> 00:35:24,990 switch. 966 00:35:26,710 --> 00:35:28,899 So the idea behind the blog basically 967 00:35:28,900 --> 00:35:31,329 is to buy an instrument that goes with 968 00:35:31,330 --> 00:35:32,699 instrumentation data, 969 00:35:34,060 --> 00:35:36,699 so data is guaranteed 970 00:35:36,700 --> 00:35:38,380 to be directly addressable. 971 00:35:39,550 --> 00:35:41,769 However, when I said one memory page one 972 00:35:41,770 --> 00:35:43,839 memory is four kilobytes and the 973 00:35:43,840 --> 00:35:45,999 most operating system and this 974 00:35:46,000 --> 00:35:47,199 is a lot of space. 975 00:35:47,200 --> 00:35:49,359 So if you take one basic block 976 00:35:49,360 --> 00:35:51,519 and then you use Facebook page for basic 977 00:35:51,520 --> 00:35:52,779 blocks, you're never going to. 978 00:35:52,780 --> 00:35:54,699 Well, you could, but you need a lot of 979 00:35:54,700 --> 00:35:55,700 ram. 980 00:35:56,590 --> 00:35:58,449 So the solution that you need to be able 981 00:35:58,450 --> 00:36:00,039 to put multiple instruments, a basic 982 00:36:00,040 --> 00:36:02,169 block into the code block 983 00:36:02,170 --> 00:36:04,419 and that you basically 984 00:36:04,420 --> 00:36:06,849 have also a lot of data space left so 985 00:36:06,850 --> 00:36:08,919 you can try to do things with that 986 00:36:08,920 --> 00:36:09,920 data space. 987 00:36:11,440 --> 00:36:13,659 So this is basically what 988 00:36:13,660 --> 00:36:14,660 we got in the end 989 00:36:16,720 --> 00:36:18,819 here. We have a very special things, 990 00:36:18,820 --> 00:36:21,339 which is a selector and the selector 991 00:36:21,340 --> 00:36:24,129 is a jump to someplace 992 00:36:24,130 --> 00:36:26,289 inside the code block and 993 00:36:26,290 --> 00:36:27,579 the selector. 994 00:36:27,580 --> 00:36:29,649 And basically it's a programable job, but 995 00:36:29,650 --> 00:36:31,639 we cannot modify the clock because it's 996 00:36:31,640 --> 00:36:33,279 really execute. 997 00:36:33,280 --> 00:36:35,469 So what it does is to jump to 998 00:36:35,470 --> 00:36:37,269 an address that's actually containing to 999 00:36:37,270 --> 00:36:39,909 block. And this way we can select 1000 00:36:39,910 --> 00:36:41,889 which basic block because here we are 1001 00:36:41,890 --> 00:36:44,149 going to store multiple basic book which 1002 00:36:44,150 --> 00:36:46,389 are basically OK, we want to execute and 1003 00:36:46,390 --> 00:36:48,909 each basic log jam to Diplomate yet. 1004 00:36:48,910 --> 00:36:51,069 So if you just programed the selector 1005 00:36:51,070 --> 00:36:52,659 and then start the execution from the 1006 00:36:52,660 --> 00:36:54,729 top, you will execute the basic lock 1007 00:36:54,730 --> 00:36:55,730 you want 1008 00:36:56,830 --> 00:36:58,569 from the disk space. 1009 00:36:58,570 --> 00:37:00,789 We are exploiting the remaining little 1010 00:37:00,790 --> 00:37:02,859 space for things we call constant and 1011 00:37:02,860 --> 00:37:03,989 shadows, basically. 1012 00:37:04,990 --> 00:37:07,209 So the 1013 00:37:07,210 --> 00:37:09,579 constant, the instrumentation constant. 1014 00:37:09,580 --> 00:37:11,259 Well, you've seen in the patch example 1015 00:37:11,260 --> 00:37:13,329 that we wanted to replace the 1016 00:37:13,330 --> 00:37:15,999 program counter and wanted to load 1017 00:37:16,000 --> 00:37:16,989 a constant. 1018 00:37:16,990 --> 00:37:18,789 So this works very well. 1019 00:37:18,790 --> 00:37:20,139 And that is Eighty-six where you have a 1020 00:37:20,140 --> 00:37:22,299 move that has a 64 bit immediate 1021 00:37:22,300 --> 00:37:23,439 into a register. 1022 00:37:23,440 --> 00:37:25,479 But this is because it's a Siskins 1023 00:37:25,480 --> 00:37:27,759 instruction set and the arm, 1024 00:37:27,760 --> 00:37:30,039 you're also very limited in the size 1025 00:37:30,040 --> 00:37:31,389 of the image you can load into a 1026 00:37:31,390 --> 00:37:32,409 register. 1027 00:37:32,410 --> 00:37:34,539 So basically we use this 1028 00:37:34,540 --> 00:37:36,729 constant space in the same way as Arm 1029 00:37:36,730 --> 00:37:39,369 Litoral for if you're already reverse 1030 00:37:39,370 --> 00:37:41,679 engineer arm code, you know that a piece 1031 00:37:41,680 --> 00:37:43,809 of code and the next piece of data 1032 00:37:43,810 --> 00:37:45,729 and constant that are directly loaded 1033 00:37:45,730 --> 00:37:48,129 referenced by the code and 1034 00:37:48,130 --> 00:37:49,389 this were the same way. 1035 00:37:49,390 --> 00:37:51,669 So this way we can load any kind of data 1036 00:37:51,670 --> 00:37:53,829 we want into our instrumentation 1037 00:37:53,830 --> 00:37:55,479 without wasting space. 1038 00:37:57,160 --> 00:37:59,079 And the other thing, which is a more 1039 00:37:59,080 --> 00:38:00,819 interesting concept, is what we call 1040 00:38:00,820 --> 00:38:01,899 instruction chateaus. 1041 00:38:02,930 --> 00:38:04,479 This is not entirely new. 1042 00:38:04,480 --> 00:38:07,389 It's inspired from vulgarized, 1043 00:38:07,390 --> 00:38:09,639 vulgarized, the way they track 1044 00:38:09,640 --> 00:38:10,839 memory allocation. 1045 00:38:10,840 --> 00:38:13,059 Geolocation is that 1046 00:38:13,060 --> 00:38:15,729 they create what they call memory shadows 1047 00:38:15,730 --> 00:38:18,249 for. So for one page 1048 00:38:18,250 --> 00:38:21,189 of memory you allocate in your program, 1049 00:38:21,190 --> 00:38:23,589 they create a small buffer that 1050 00:38:23,590 --> 00:38:25,749 inside the buffer, each bits represent 1051 00:38:25,750 --> 00:38:27,999 the state basically of each byte 1052 00:38:28,000 --> 00:38:29,859 of the memory page. 1053 00:38:29,860 --> 00:38:32,199 And this is shadowing basically 1054 00:38:32,200 --> 00:38:33,399 the memory. 1055 00:38:33,400 --> 00:38:35,199 And so it's some kind of variable that's 1056 00:38:35,200 --> 00:38:37,239 bind it to the to the memory. 1057 00:38:37,240 --> 00:38:39,369 And we we wanted to do the same stuff, 1058 00:38:39,370 --> 00:38:41,109 but for instructions. 1059 00:38:41,110 --> 00:38:43,509 So it's it's 1060 00:38:43,510 --> 00:38:46,089 basically a means for us to 1061 00:38:46,090 --> 00:38:48,579 abstract the idea of giving variables 1062 00:38:48,580 --> 00:38:50,829 inside your instrumentation code 1063 00:38:50,830 --> 00:38:53,199 to make 1064 00:38:53,200 --> 00:38:55,329 inline instrumentation very 1065 00:38:55,330 --> 00:38:56,330 easy. 1066 00:38:57,790 --> 00:38:59,949 One of the main use keys right 1067 00:38:59,950 --> 00:39:02,739 now is to record memory access. 1068 00:39:02,740 --> 00:39:04,989 So memory access are a bit of a problem 1069 00:39:04,990 --> 00:39:06,459 if you want to record everything that's 1070 00:39:06,460 --> 00:39:09,009 going on in your program, because 1071 00:39:09,010 --> 00:39:11,139 to record read 1072 00:39:11,140 --> 00:39:12,849 memory access, you need to instrument 1073 00:39:12,850 --> 00:39:14,529 before the instruction, write them or 1074 00:39:14,530 --> 00:39:15,939 access you into instrument after the 1075 00:39:15,940 --> 00:39:16,899 instruction. 1076 00:39:16,900 --> 00:39:19,509 So you adding a lot of instrumentation. 1077 00:39:19,510 --> 00:39:21,039 And for example, if you want to call your 1078 00:39:21,040 --> 00:39:23,169 own code in a callback, 1079 00:39:23,170 --> 00:39:25,089 then you need to switch context again. 1080 00:39:25,090 --> 00:39:26,889 So you're making to context for 1081 00:39:26,890 --> 00:39:29,679 instruction and then your instrumentation 1082 00:39:29,680 --> 00:39:31,079 becomes very slow. 1083 00:39:31,080 --> 00:39:33,759 Um, so the solution 1084 00:39:33,760 --> 00:39:35,949 is to do what we call in line 1085 00:39:35,950 --> 00:39:38,109 instrumentation, where you 1086 00:39:38,110 --> 00:39:40,719 are going to do 1087 00:39:40,720 --> 00:39:42,849 the recording of the memory access 1088 00:39:42,850 --> 00:39:44,979 directly in assembly without using 1089 00:39:44,980 --> 00:39:47,349 the instrumentation tool itself. 1090 00:39:47,350 --> 00:39:49,839 And these variables, 1091 00:39:49,840 --> 00:39:52,149 the shadows basically are used 1092 00:39:52,150 --> 00:39:54,429 for that. So for the instruction 1093 00:39:54,430 --> 00:39:57,129 that the memory access will create 1094 00:39:57,130 --> 00:39:59,439 shadows that are used to simply 1095 00:39:59,440 --> 00:40:01,519 stored the memory access and the 1096 00:40:01,520 --> 00:40:03,579 address accessed so you can execute the 1097 00:40:03,580 --> 00:40:05,439 whole basic block. And then at the end 1098 00:40:05,440 --> 00:40:07,359 you can just create a shadow of that 1099 00:40:07,360 --> 00:40:09,459 basic log to know which address 1100 00:40:09,460 --> 00:40:11,309 and what data were transferred. 1101 00:40:13,840 --> 00:40:16,149 So to realize all of that, 1102 00:40:16,150 --> 00:40:18,489 uh, it's not easy if you're thinking 1103 00:40:18,490 --> 00:40:21,099 about in a military architecture context, 1104 00:40:21,100 --> 00:40:23,319 because we need to cross platform mean 1105 00:40:23,320 --> 00:40:25,799 basically memory management unit. 1106 00:40:25,800 --> 00:40:27,989 And we need obstruction of the debt 1107 00:40:27,990 --> 00:40:29,929 because we want to allocate memory and 1108 00:40:29,930 --> 00:40:32,609 won't want to change those permissions 1109 00:40:32,610 --> 00:40:34,739 and also we want 1110 00:40:34,740 --> 00:40:36,989 the architecture assembler that works 1111 00:40:36,990 --> 00:40:38,069 in memory. 1112 00:40:38,070 --> 00:40:39,599 We don't want to create a binary on the 1113 00:40:39,600 --> 00:40:40,600 disk. 1114 00:40:41,070 --> 00:40:43,139 And it's not that simple. 1115 00:40:43,140 --> 00:40:45,149 A lot of assembler simply assume that 1116 00:40:45,150 --> 00:40:47,189 there are going to create sections and a 1117 00:40:47,190 --> 00:40:49,559 binary object and things like that. 1118 00:40:49,560 --> 00:40:50,639 But this is not the case. 1119 00:40:52,440 --> 00:40:53,440 But guess what? 1120 00:40:54,540 --> 00:40:57,209 LVM saves us again, because 1121 00:40:57,210 --> 00:40:59,609 when the 11 project was started, 1122 00:40:59,610 --> 00:41:01,709 it was a little gem for low level vitrine 1123 00:41:01,710 --> 00:41:03,989 machine. So they had that bad code 1124 00:41:03,990 --> 00:41:06,269 that was across nearly 1125 00:41:06,270 --> 00:41:08,159 cross architecture. 1126 00:41:08,160 --> 00:41:10,319 And one of the things they did is 1127 00:41:10,320 --> 00:41:13,259 build just in engine to execute 1128 00:41:13,260 --> 00:41:14,999 that bytecode. 1129 00:41:15,000 --> 00:41:17,429 And meeting being a Just-In-Time 1130 00:41:17,430 --> 00:41:19,049 engine, it means it's very close to the 1131 00:41:19,050 --> 00:41:20,429 design of Adibi. 1132 00:41:20,430 --> 00:41:23,369 And so they have everything 1133 00:41:23,370 --> 00:41:25,379 we would need for that. 1134 00:41:25,380 --> 00:41:27,779 And we cannot 1135 00:41:27,780 --> 00:41:29,549 use directly the Just-In-Time engine, 1136 00:41:29,550 --> 00:41:31,259 however, because although they're very 1137 00:41:31,260 --> 00:41:33,329 well-designed, they do not really fit our 1138 00:41:33,330 --> 00:41:34,679 use case in the way we work. 1139 00:41:36,390 --> 00:41:38,459 But inside of you have all the 1140 00:41:38,460 --> 00:41:40,229 functions you need if you want to create 1141 00:41:40,230 --> 00:41:42,329 a just in time all the cross 1142 00:41:42,330 --> 00:41:43,889 architecture, memory management 1143 00:41:43,890 --> 00:41:46,199 abstractions and also 1144 00:41:46,200 --> 00:41:48,269 this powerful in-memory assembler, which 1145 00:41:48,270 --> 00:41:49,270 is LMC. 1146 00:41:51,360 --> 00:41:53,759 So what we learn from that is that 1147 00:41:53,760 --> 00:41:54,839 really, if you want to create a 1148 00:41:54,840 --> 00:41:56,789 Just-In-Time, LVM is really perfect for 1149 00:41:56,790 --> 00:41:57,790 the job. 1150 00:41:58,800 --> 00:42:01,109 But also designing a Just-In-Time 1151 00:42:01,110 --> 00:42:03,479 engine for DBI, taking 1152 00:42:03,480 --> 00:42:05,129 into account the cross architecture 1153 00:42:05,130 --> 00:42:06,809 problem is really difficult because you 1154 00:42:06,810 --> 00:42:08,879 can easily be locked down into a 1155 00:42:08,880 --> 00:42:11,249 simple architecture if you start assuming 1156 00:42:11,250 --> 00:42:13,499 that you can simply access 1157 00:42:13,500 --> 00:42:16,039 memory with a 32 bit offset, for example. 1158 00:42:18,030 --> 00:42:19,949 And so you need to think about 1159 00:42:19,950 --> 00:42:22,019 portability from the start if you want 1160 00:42:22,020 --> 00:42:23,699 to design that kind of project. 1161 00:42:28,170 --> 00:42:31,049 So all of this, 1162 00:42:31,050 --> 00:42:33,959 in fact, are small parts of 1163 00:42:33,960 --> 00:42:36,899 our project, which is called 1164 00:42:36,900 --> 00:42:39,029 comedy, I could be the stand 1165 00:42:39,030 --> 00:42:41,129 for Kwok's club dynamic 1166 00:42:41,130 --> 00:42:43,109 binary instrumentation. 1167 00:42:43,110 --> 00:42:45,839 We are like very imaginative. 1168 00:42:45,840 --> 00:42:48,029 And so it's a cross 1169 00:42:48,030 --> 00:42:50,099 platform, cross architecture, 1170 00:42:50,100 --> 00:42:51,100 DBI framework 1171 00:42:52,950 --> 00:42:55,049 by cross platform, which 1172 00:42:55,050 --> 00:42:57,869 means that today it runs 1173 00:42:57,870 --> 00:43:00,239 Linux, MapQuest, Windows, Android 1174 00:43:00,240 --> 00:43:01,240 and iOS. 1175 00:43:02,810 --> 00:43:04,969 And we really focused 1176 00:43:04,970 --> 00:43:07,399 on the last few months to 1177 00:43:07,400 --> 00:43:09,799 bring something user-friendly, which 1178 00:43:09,800 --> 00:43:11,539 is kind of odd, you have like a big 1179 00:43:11,540 --> 00:43:14,119 engine, it's a complex machinery, 1180 00:43:14,120 --> 00:43:16,369 but we really wanted to have something 1181 00:43:16,370 --> 00:43:18,649 really easy. So basically, we focus 1182 00:43:18,650 --> 00:43:22,099 on clean his extensive documentation 1183 00:43:22,100 --> 00:43:24,529 and we also provide binary packages 1184 00:43:24,530 --> 00:43:26,809 for major operating system 1185 00:43:26,810 --> 00:43:28,400 or Linux distribution. 1186 00:43:30,000 --> 00:43:32,159 And it's a modular design, and by 1187 00:43:32,160 --> 00:43:35,429 this, what we means is basically 1188 00:43:35,430 --> 00:43:38,039 the car engine of the DBI 1189 00:43:38,040 --> 00:43:40,229 should only do what is essential 1190 00:43:40,230 --> 00:43:41,139 for DBI. 1191 00:43:41,140 --> 00:43:43,889 So no Auntie a.T.M 1192 00:43:43,890 --> 00:43:46,259 for nothing related to 1193 00:43:46,260 --> 00:43:48,089 Tebogo, I don't know, like everything 1194 00:43:48,090 --> 00:43:50,609 which is not part of the DBI. 1195 00:43:50,610 --> 00:43:52,859 And the idea was of 1196 00:43:52,860 --> 00:43:55,049 keeping the things simple is that by 1197 00:43:55,050 --> 00:43:56,669 keeping the things simple, you don't 1198 00:43:56,670 --> 00:43:58,799 force users to do things in 1199 00:43:58,800 --> 00:43:59,800 your way. 1200 00:44:00,570 --> 00:44:02,639 Basically, you don't we don't 1201 00:44:02,640 --> 00:44:04,979 have like one injection method 1202 00:44:04,980 --> 00:44:07,259 that you need to use 1203 00:44:07,260 --> 00:44:09,539 that force you to do things 1204 00:44:09,540 --> 00:44:11,639 in a certain way and limit you basically 1205 00:44:11,640 --> 00:44:12,969 by doing this. 1206 00:44:12,970 --> 00:44:15,149 And what we have 1207 00:44:15,150 --> 00:44:18,089 at the end, it's basically is integration 1208 00:44:18,090 --> 00:44:20,689 because the idea is just a library, 1209 00:44:20,690 --> 00:44:23,609 static or dynamic library, your choice. 1210 00:44:23,610 --> 00:44:26,339 And so we have created with this library 1211 00:44:26,340 --> 00:44:28,559 python bindings to allow very 1212 00:44:28,560 --> 00:44:30,869 fast experimentation. 1213 00:44:30,870 --> 00:44:32,939 And we also have like 1214 00:44:32,940 --> 00:44:35,330 full featured integration with Freida 1215 00:44:36,360 --> 00:44:38,459 Freud. I'm sure lots of you already 1216 00:44:38,460 --> 00:44:40,229 know what is it? It's like very a very 1217 00:44:40,230 --> 00:44:43,619 nice framework for instrumenting binary 1218 00:44:43,620 --> 00:44:44,969 in a different way. 1219 00:44:44,970 --> 00:44:48,329 And they are really perfect 1220 00:44:48,330 --> 00:44:50,729 by using together, 1221 00:44:50,730 --> 00:44:52,349 really, if you combine the power of the 1222 00:44:52,350 --> 00:44:53,669 DBI, goodbye. 1223 00:44:53,670 --> 00:44:55,379 And Freeda, it's something really 1224 00:44:55,380 --> 00:44:57,020 impressive. We will see you with a demo, 1225 00:44:58,560 --> 00:45:00,689 current one that basically we 1226 00:45:00,690 --> 00:45:02,849 have some we are a bit late 1227 00:45:02,850 --> 00:45:04,979 for the LRM support, but you have 1228 00:45:04,980 --> 00:45:07,109 seen it's basically 1229 00:45:07,110 --> 00:45:07,859 adding rules. 1230 00:45:07,860 --> 00:45:09,989 In fact, the engine itself is already 1231 00:45:09,990 --> 00:45:11,939 here, is already working, running on 1232 00:45:11,940 --> 00:45:13,320 ihram. We just need to 1233 00:45:14,430 --> 00:45:16,739 just finish the rules and maybe also 1234 00:45:16,740 --> 00:45:18,749 focus on the sixty four. 1235 00:45:19,980 --> 00:45:21,719 We need to improve the memory access 1236 00:45:21,720 --> 00:45:23,939 because currently we don't have the 1237 00:45:23,940 --> 00:45:25,589 same memory access, which is a bit of a 1238 00:45:25,590 --> 00:45:27,059 problem also. 1239 00:45:27,060 --> 00:45:29,219 And we also want to focus on Milty 1240 00:45:29,220 --> 00:45:30,809 shredding and the exception, but not in 1241 00:45:30,810 --> 00:45:32,849 the same way that most of engine work, 1242 00:45:32,850 --> 00:45:34,889 because we really want to keep the core 1243 00:45:34,890 --> 00:45:37,259 very simple. So it will be probably 1244 00:45:37,260 --> 00:45:39,629 integrating. We have something 1245 00:45:39,630 --> 00:45:42,779 like an Aleppo library or something and 1246 00:45:42,780 --> 00:45:44,489 a better side of the project, but we 1247 00:45:44,490 --> 00:45:47,189 don't know all exactly right now. 1248 00:45:47,190 --> 00:45:48,749 So the more time 1249 00:45:50,040 --> 00:45:52,709 so far, the more will be on the 1250 00:45:52,710 --> 00:45:54,989 engine itself and its python bindings. 1251 00:45:56,370 --> 00:45:58,469 So we just need to I 1252 00:45:58,470 --> 00:46:00,209 hope we will not break everything by 1253 00:46:00,210 --> 00:46:01,210 doing this. 1254 00:46:06,430 --> 00:46:08,619 Um, maybe we should just drag 1255 00:46:08,620 --> 00:46:09,620 and drop this. 1256 00:46:14,530 --> 00:46:15,530 Good. 1257 00:46:18,650 --> 00:46:19,760 I think it's OK. 1258 00:46:22,260 --> 00:46:24,939 So, uh, uh, so 1259 00:46:24,940 --> 00:46:27,059 it's a very simple demo, I just want 1260 00:46:27,060 --> 00:46:29,429 you want to show you how 1261 00:46:29,430 --> 00:46:31,499 easy and simple we 1262 00:46:31,500 --> 00:46:32,500 try to make the. 1263 00:46:34,200 --> 00:46:35,939 So this is a simple binary that like a 1264 00:46:35,940 --> 00:46:38,609 password on inputs 1265 00:46:38,610 --> 00:46:39,999 and the password. 1266 00:46:40,000 --> 00:46:42,539 Uh, well, uh, you cannot find it easily 1267 00:46:42,540 --> 00:46:44,609 if you just look at the software. 1268 00:46:44,610 --> 00:46:46,799 So that's 1269 00:46:46,800 --> 00:46:49,439 where I'm very glad that my 1270 00:46:49,440 --> 00:46:51,929 colleague use Freking 1271 00:46:51,930 --> 00:46:54,149 AZT keyboards, which 1272 00:46:54,150 --> 00:46:55,150 is. 1273 00:46:55,800 --> 00:46:57,089 Yeah. Yeah. 1274 00:46:57,090 --> 00:46:58,439 As it works. 1275 00:46:58,440 --> 00:47:00,779 Uh, so yeah. 1276 00:47:00,780 --> 00:47:02,879 So what we're going to try to do is just 1277 00:47:02,880 --> 00:47:05,039 display every memory accessed by 1278 00:47:05,040 --> 00:47:05,969 the program. 1279 00:47:05,970 --> 00:47:08,279 And because there's a lot of memory 1280 00:47:08,280 --> 00:47:09,959 access, usually we just going to focus 1281 00:47:09,960 --> 00:47:12,089 on. Right. So we use the Wikipedia, 1282 00:47:12,090 --> 00:47:14,189 the bindings and 1283 00:47:14,190 --> 00:47:16,529 the main thing is the disk callback 1284 00:47:16,530 --> 00:47:18,659 which is called Wunder, the DB 1285 00:47:18,660 --> 00:47:20,399 want to start and we simply do two 1286 00:47:20,400 --> 00:47:20,699 things. 1287 00:47:20,700 --> 00:47:23,159 We add the memory access callback and 1288 00:47:23,160 --> 00:47:26,109 we run the engine from start to stop. 1289 00:47:26,110 --> 00:47:28,259 Um, so the memory callback 1290 00:47:28,260 --> 00:47:29,669 is just right here. 1291 00:47:29,670 --> 00:47:31,849 And what does it just get an 1292 00:47:31,850 --> 00:47:33,959 analysis of the instruction and, 1293 00:47:33,960 --> 00:47:35,579 uh, the information about the memory 1294 00:47:35,580 --> 00:47:37,859 access made and simply, 1295 00:47:37,860 --> 00:47:39,659 uh, print. Uh, well you wrote the blah 1296 00:47:39,660 --> 00:47:40,710 blah blah. Uh 1297 00:47:42,150 --> 00:47:43,709 oh yeah. Great. 1298 00:47:43,710 --> 00:47:46,469 So, uh 1299 00:47:46,470 --> 00:47:47,640 uh. 1300 00:47:49,180 --> 00:47:50,180 Listen. 1301 00:47:51,960 --> 00:47:53,449 No, no, no, it's a reality. 1302 00:47:55,110 --> 00:47:56,949 Is there a password somewhere? 1303 00:47:56,950 --> 00:47:59,099 No, but we want 1304 00:47:59,100 --> 00:48:00,619 to run it maybe. 1305 00:48:00,620 --> 00:48:01,859 Yeah, yeah. 1306 00:48:03,330 --> 00:48:05,610 OK, I want to run it. 1307 00:48:06,810 --> 00:48:08,789 It's not in the story because it was too 1308 00:48:08,790 --> 00:48:09,790 easy. 1309 00:48:10,400 --> 00:48:12,799 So basically, all will this 1310 00:48:12,800 --> 00:48:13,800 be of. 1311 00:48:15,630 --> 00:48:17,759 OK, so there 1312 00:48:17,760 --> 00:48:19,559 we go. Yeah, thank you. 1313 00:48:19,560 --> 00:48:21,629 Uh, so, yeah, we can run 1314 00:48:21,630 --> 00:48:23,219 it and you can see there is a lot of 1315 00:48:23,220 --> 00:48:24,840 memory access that are made by this. 1316 00:48:27,260 --> 00:48:29,359 This thing, so 1317 00:48:29,360 --> 00:48:30,829 right now, it's not really readable, 1318 00:48:30,830 --> 00:48:33,199 there's a lot of memory point of stuff, 1319 00:48:33,200 --> 00:48:34,879 so maybe because we know it's a password 1320 00:48:34,880 --> 00:48:37,309 we want to filter about by memory 1321 00:48:37,310 --> 00:48:39,379 access that our one one 1322 00:48:39,380 --> 00:48:41,539 byte long. So there we do that. 1323 00:48:41,540 --> 00:48:42,979 We check if the size is one. 1324 00:48:42,980 --> 00:48:45,279 We only printed the size sizes one. 1325 00:48:45,280 --> 00:48:47,419 Um, so let's 1326 00:48:47,420 --> 00:48:49,119 run that, that's all. 1327 00:48:50,900 --> 00:48:52,759 So here it's more interesting. 1328 00:48:52,760 --> 00:48:54,889 Um, uh, we can see 1329 00:48:54,890 --> 00:48:56,959 we have bad value on the left 1330 00:48:56,960 --> 00:48:58,429 and we have a lot of instruction that 1331 00:48:58,430 --> 00:49:00,469 print that and there is a bit of 1332 00:49:00,470 --> 00:49:02,539 saunders' and maybe 1333 00:49:02,540 --> 00:49:04,639 also when you see so you usually think 1334 00:49:04,640 --> 00:49:06,829 Krypto or we are the 1335 00:49:06,830 --> 00:49:08,689 image manipulation algorithms. 1336 00:49:08,690 --> 00:49:11,059 Um, so 1337 00:49:11,060 --> 00:49:13,849 what we can do is now, uh, simply 1338 00:49:13,850 --> 00:49:15,649 do the same stuff. But we also filter 1339 00:49:15,650 --> 00:49:17,749 folks or instruction to see what we 1340 00:49:17,750 --> 00:49:18,750 get. 1341 00:49:19,970 --> 00:49:21,350 Um. 1342 00:49:23,590 --> 00:49:26,349 Yeah, and so it's 1343 00:49:26,350 --> 00:49:29,209 so now there's not that many, 1344 00:49:29,210 --> 00:49:31,719 uh, so, uh, instruction 1345 00:49:31,720 --> 00:49:33,969 left and why it looks like 1346 00:49:33,970 --> 00:49:36,069 random at the beginning, the end 1347 00:49:36,070 --> 00:49:37,899 is interesting because always in the same 1348 00:49:37,900 --> 00:49:40,029 kind of memory range, uh, that 1349 00:49:40,030 --> 00:49:41,319 looks like ASCII. 1350 00:49:41,320 --> 00:49:44,379 So, uh, the last thing 1351 00:49:44,380 --> 00:49:47,139 the last idea is simply to 1352 00:49:47,140 --> 00:49:49,479 aggregate those bytes into a buffer. 1353 00:49:49,480 --> 00:49:52,029 So you see that you have a data reader 1354 00:49:52,030 --> 00:49:54,309 is passed as a parameter to the callback 1355 00:49:54,310 --> 00:49:55,269 and so on. 1356 00:49:55,270 --> 00:49:57,579 Each of those instructions would simply 1357 00:49:57,580 --> 00:50:00,249 append the data to our array 1358 00:50:00,250 --> 00:50:02,409 and will fringier at 1359 00:50:02,410 --> 00:50:03,559 the end. 1360 00:50:03,560 --> 00:50:05,499 And if we do that. 1361 00:50:08,770 --> 00:50:10,899 We can see we have some 1362 00:50:10,900 --> 00:50:13,509 garbage and then at the end, some 1363 00:50:13,510 --> 00:50:15,729 text that looks like import Tritan. 1364 00:50:15,730 --> 00:50:16,730 So 1365 00:50:17,860 --> 00:50:20,009 if we do to 1366 00:50:20,010 --> 00:50:21,540 improve it. 1367 00:50:27,570 --> 00:50:29,339 Well, it's the correct password, so it 1368 00:50:29,340 --> 00:50:31,229 was simply decrypts of memory and we can 1369 00:50:31,230 --> 00:50:32,819 look at it at the end. 1370 00:50:32,820 --> 00:50:34,889 So this show you how 1371 00:50:34,890 --> 00:50:36,899 simple this kind of binding can be. 1372 00:50:36,900 --> 00:50:38,999 And that's really not hard to use a DBI. 1373 00:50:48,610 --> 00:50:51,189 OK, another we have time for another 1374 00:50:51,190 --> 00:50:53,379 Dimo, so but I will 1375 00:50:53,380 --> 00:50:56,439 just maybe, I don't know, 1376 00:50:56,440 --> 00:50:59,499 um, yeah, OK. 1377 00:50:59,500 --> 00:51:02,169 I will try to do it on the remote screen, 1378 00:51:02,170 --> 00:51:04,269 but at least you don't have 1379 00:51:04,270 --> 00:51:06,179 to use a nasty keyboard. 1380 00:51:06,180 --> 00:51:08,949 Yeah, I'm French, 1381 00:51:08,950 --> 00:51:11,109 so yeah, we 1382 00:51:11,110 --> 00:51:13,389 have a demo binary which will launch 1383 00:51:13,390 --> 00:51:14,319 it. 1384 00:51:14,320 --> 00:51:16,569 It's small but we increase 1385 00:51:16,570 --> 00:51:18,159 the size and we can see OK, it's done. 1386 00:51:18,160 --> 00:51:19,599 Nothing like it takes. 1387 00:51:19,600 --> 00:51:21,549 Oh I have an input screen which is Illo 1388 00:51:21,550 --> 00:51:24,249 and they do things with Illo and 1389 00:51:24,250 --> 00:51:26,349 OK. And the idea is ok, we want 1390 00:51:26,350 --> 00:51:28,479 to reverse engineering it 1391 00:51:28,480 --> 00:51:30,309 and to understand what it does. 1392 00:51:30,310 --> 00:51:31,310 So 1393 00:51:32,440 --> 00:51:34,869 we will do this using the 1394 00:51:34,870 --> 00:51:37,119 radar framework and. 1395 00:51:38,680 --> 00:51:41,889 OK, so it's 1396 00:51:41,890 --> 00:51:43,629 a reboot in my laptop in fact. 1397 00:51:44,770 --> 00:51:47,200 OK, so first we will load Freida. 1398 00:51:52,500 --> 00:51:55,469 Dismal failure, then 1399 00:51:55,470 --> 00:51:57,810 we will applaud the binary. 1400 00:52:00,470 --> 00:52:02,569 Is located, I think, our 1401 00:52:02,570 --> 00:52:04,699 Lokoja binary, OK, freed 1402 00:52:04,700 --> 00:52:06,979 up from Iraq and we'll separate, 1403 00:52:06,980 --> 00:52:09,469 OK, I want to load the demo binary. 1404 00:52:11,690 --> 00:52:14,359 But we are launching Frieder, I don't see 1405 00:52:14,360 --> 00:52:17,449 it from there, so, 1406 00:52:17,450 --> 00:52:19,669 OK, I take control. 1407 00:52:19,670 --> 00:52:21,739 OK, so here we are in Frieder. 1408 00:52:21,740 --> 00:52:23,989 It's the environment most 1409 00:52:23,990 --> 00:52:25,429 of you already know if you have your 1410 00:52:25,430 --> 00:52:27,509 freedom. And what we can do with 1411 00:52:27,510 --> 00:52:29,599 freedom basically is things 1412 00:52:29,600 --> 00:52:31,879 like, OK, I want I have been 1413 00:52:31,880 --> 00:52:33,949 reversed a bit the binary and I have 1414 00:52:33,950 --> 00:52:35,539 seen that there is a function which is 1415 00:52:35,540 --> 00:52:37,609 called like sekret. 1416 00:52:37,610 --> 00:52:40,079 So I would just do 1417 00:52:40,080 --> 00:52:42,289 a call if it's 1418 00:52:42,290 --> 00:52:44,539 only through the API, which is OK, I 1419 00:52:44,540 --> 00:52:46,729 want the address of secret, so 1420 00:52:46,730 --> 00:52:49,099 I just return the address of secret 1421 00:52:49,100 --> 00:52:51,409 and I say, OK, we need an input for 1422 00:52:51,410 --> 00:52:54,169 this function and basically 1423 00:52:54,170 --> 00:52:55,170 I will 1424 00:52:56,540 --> 00:52:58,279 use a string for this input. 1425 00:52:58,280 --> 00:53:00,439 So Freude allow us to basically 1426 00:53:00,440 --> 00:53:02,809 do a remote location of memory and inject 1427 00:53:02,810 --> 00:53:04,669 the string inside the remote process 1428 00:53:04,670 --> 00:53:05,869 memory. 1429 00:53:05,870 --> 00:53:07,939 And using this you 1430 00:53:07,940 --> 00:53:10,339 can say, OK, I want Nattie function. 1431 00:53:13,020 --> 00:53:15,359 And this is like a JavaScript function, 1432 00:53:15,360 --> 00:53:17,489 which allows you to do the call of 1433 00:53:17,490 --> 00:53:19,949 the function address that you 1434 00:53:19,950 --> 00:53:21,929 resolve and by doing. 1435 00:53:23,160 --> 00:53:25,319 So there is a completion 1436 00:53:25,320 --> 00:53:27,659 if you do some input, basically 1437 00:53:27,660 --> 00:53:30,149 Fris executing your remote function 1438 00:53:30,150 --> 00:53:32,279 with the input that 040 1439 00:53:32,280 --> 00:53:35,129 like elsewhere. OK, so 1440 00:53:35,130 --> 00:53:37,229 what we can do with a Adibi 1441 00:53:37,230 --> 00:53:39,749 and the integration that we can simply 1442 00:53:39,750 --> 00:53:41,219 create a virtual c.p.u. 1443 00:53:41,220 --> 00:53:43,919 So we will allocate our DBI. 1444 00:53:43,920 --> 00:53:45,869 So by doing this simply. 1445 00:53:47,140 --> 00:53:49,449 All right, OK, so we have 1446 00:53:49,450 --> 00:53:52,959 the object with them, we will create 1447 00:53:52,960 --> 00:53:55,310 a state thing 1448 00:53:56,410 --> 00:53:58,119 like the competition very easy, 1449 00:53:59,230 --> 00:54:01,539 and then we will try to strike using 1450 00:54:01,540 --> 00:54:03,549 Esterina competition because we don't 1451 00:54:03,550 --> 00:54:04,929 have lots of time. 1452 00:54:04,930 --> 00:54:07,809 And every we go, we have a basically a 1453 00:54:07,810 --> 00:54:10,149 virtual CPU, which is 1454 00:54:10,150 --> 00:54:12,819 we have initialize each state and we have 1455 00:54:12,820 --> 00:54:14,529 a virtual stack for it. 1456 00:54:14,530 --> 00:54:16,629 And what we need to do now is to say, 1457 00:54:16,630 --> 00:54:18,430 OK, we want to instrument 1458 00:54:19,840 --> 00:54:21,070 the demo binary. 1459 00:54:23,140 --> 00:54:25,239 There it's to avoid the call to 1460 00:54:25,240 --> 00:54:27,079 the external library is this is the 1461 00:54:27,080 --> 00:54:29,259 future of the four hour debates that 1462 00:54:29,260 --> 00:54:31,389 we can really choose, what part of the 1463 00:54:31,390 --> 00:54:33,709 code we want to instrument and really let 1464 00:54:33,710 --> 00:54:36,639 everything's executed by themselves? 1465 00:54:36,640 --> 00:54:37,899 So we are we are ready. 1466 00:54:37,900 --> 00:54:40,689 You can just do a call coding. 1467 00:54:40,690 --> 00:54:43,119 So you using the Nattie 1468 00:54:43,120 --> 00:54:45,249 function pointer of freedom and then 1469 00:54:45,250 --> 00:54:47,319 the list of input, 1470 00:54:47,320 --> 00:54:49,959 a list of arguments and which is input. 1471 00:54:49,960 --> 00:54:52,179 And you say, OK, we have exactly the same 1472 00:54:52,180 --> 00:54:54,669 output and the same return 1473 00:54:54,670 --> 00:54:56,739 value and you say, OK, but you 1474 00:54:56,740 --> 00:54:58,959 have exactly the same thing. 1475 00:54:58,960 --> 00:55:00,669 So your DB is kind of less. 1476 00:55:00,670 --> 00:55:02,919 Now we are adding right 1477 00:55:02,920 --> 00:55:05,079 now we can do it's adding instrumentation 1478 00:55:05,080 --> 00:55:07,269 to it because here we have just executed 1479 00:55:07,270 --> 00:55:09,429 the original code, but inside 1480 00:55:09,430 --> 00:55:11,949 the binary device 1481 00:55:11,950 --> 00:55:13,029 and to creating the 1482 00:55:14,500 --> 00:55:16,779 to add the instrumentation, you 1483 00:55:16,780 --> 00:55:18,909 just need to use the industry 1484 00:55:18,910 --> 00:55:21,189 and create something like 1485 00:55:22,450 --> 00:55:23,750 an instruction callback. 1486 00:55:25,360 --> 00:55:26,360 OK, 1487 00:55:28,360 --> 00:55:30,759 so what I have done basically 1488 00:55:30,760 --> 00:55:32,979 is I have created a JavaScript 1489 00:55:32,980 --> 00:55:35,169 function that will be called at runtime 1490 00:55:35,170 --> 00:55:37,269 for every instruction and that's 1491 00:55:37,270 --> 00:55:39,489 called back with basically IR just 1492 00:55:40,690 --> 00:55:43,659 dump the general-purpose registers 1493 00:55:43,660 --> 00:55:45,969 and also ask for total 1494 00:55:45,970 --> 00:55:48,279 analysis of the instruction, 1495 00:55:48,280 --> 00:55:50,799 another to the Assembly of the Assembly 1496 00:55:50,800 --> 00:55:53,019 three and the original address. 1497 00:55:53,020 --> 00:55:55,179 And this 1498 00:55:55,180 --> 00:55:57,069 analysis, by the way, they are managed by 1499 00:55:57,070 --> 00:55:59,319 the Devi and they are 1500 00:55:59,320 --> 00:56:01,299 there is cache, there is a lot of design, 1501 00:56:01,300 --> 00:56:03,489 but there is simply we don't 1502 00:56:03,490 --> 00:56:04,509 know what happened. 1503 00:56:04,510 --> 00:56:05,589 But it's magic. 1504 00:56:05,590 --> 00:56:07,689 You have like the analysis of the 1505 00:56:07,690 --> 00:56:10,269 instruction and after this, 1506 00:56:10,270 --> 00:56:12,339 you only need to add 1507 00:56:12,340 --> 00:56:14,099 your callback so 1508 00:56:15,700 --> 00:56:17,379 that I cheat again. 1509 00:56:17,380 --> 00:56:19,479 So what I ask here is that 1510 00:56:19,480 --> 00:56:21,759 the code callback for 1511 00:56:21,760 --> 00:56:23,829 before every instruction and I 1512 00:56:23,830 --> 00:56:25,479 will call my JavaScript function. 1513 00:56:27,050 --> 00:56:29,469 So now the callback is added 1514 00:56:29,470 --> 00:56:31,689 and we can just go back 1515 00:56:31,690 --> 00:56:33,789 and call again our 1516 00:56:33,790 --> 00:56:34,790 input function. 1517 00:56:35,940 --> 00:56:38,039 And here we go, what we have is 1518 00:56:38,040 --> 00:56:39,659 our JavaScript function, which has been 1519 00:56:39,660 --> 00:56:41,729 executed, and we see for every 1520 00:56:41,730 --> 00:56:43,769 instruction we have the address, we are 1521 00:56:43,770 --> 00:56:45,329 the instruction, we are the full 1522 00:56:46,500 --> 00:56:49,459 context of the general purpose registers 1523 00:56:49,460 --> 00:56:51,949 and lots of them. 1524 00:56:51,950 --> 00:56:54,119 We can see also that in between we are 1525 00:56:54,120 --> 00:56:56,879 the call for the 1526 00:56:56,880 --> 00:56:57,959 external library. 1527 00:56:57,960 --> 00:57:00,329 Yeah, we are jumping outside the DBI 1528 00:57:00,330 --> 00:57:02,789 and it's basically the standard library 1529 00:57:02,790 --> 00:57:05,609 of the Lipsy, which is doing stuff 1530 00:57:05,610 --> 00:57:07,799 and the arm or not surprising, 1531 00:57:07,800 --> 00:57:09,359 we have a written section and it's 1532 00:57:09,360 --> 00:57:11,309 written in zero. So everything works as 1533 00:57:11,310 --> 00:57:12,310 before. 1534 00:57:20,800 --> 00:57:23,199 So Goodbye 1535 00:57:23,200 --> 00:57:26,469 is an open source project 1536 00:57:26,470 --> 00:57:28,509 which have just released it a few days 1537 00:57:28,510 --> 00:57:30,819 ago, so 1538 00:57:30,820 --> 00:57:33,339 don't hesitate to to give it a try. 1539 00:57:33,340 --> 00:57:35,739 It's also been released under 1540 00:57:35,740 --> 00:57:36,999 a permissive license. 1541 00:57:37,000 --> 00:57:39,819 So feel free for any suggestions 1542 00:57:39,820 --> 00:57:41,799 or requests or anything. 1543 00:57:41,800 --> 00:57:44,979 We have a channel on Freenode. 1544 00:57:44,980 --> 00:57:47,349 Just join us if you're interested 1545 00:57:47,350 --> 00:57:48,849 in this project. 1546 00:57:48,850 --> 00:57:51,129 And I would like to end 1547 00:57:51,130 --> 00:57:52,630 by really 1548 00:57:53,650 --> 00:57:55,679 giving a big thanks to our colleague 1549 00:57:55,680 --> 00:57:57,849 across Labor and for all 1550 00:57:57,850 --> 00:58:00,069 the beta test and for supporting us every 1551 00:58:00,070 --> 00:58:01,330 day and 1552 00:58:02,350 --> 00:58:04,479 also especially to Paul and Joe, because 1553 00:58:04,480 --> 00:58:06,369 I've done like major contribution to this 1554 00:58:06,370 --> 00:58:08,319 really, and it will not be the same 1555 00:58:08,320 --> 00:58:10,839 without them. And also big funds for 1556 00:58:10,840 --> 00:58:13,029 Kwok's lab, our company, to allow us 1557 00:58:13,030 --> 00:58:15,909 to release this software, 1558 00:58:15,910 --> 00:58:18,249 like with a massive license 1559 00:58:18,250 --> 00:58:21,259 and allow it to grow by his own. 1560 00:58:21,260 --> 00:58:22,959 So thank you very much. 1561 00:58:30,600 --> 00:58:32,040 So if you yeah, 1562 00:58:33,150 --> 00:58:34,799 if you want to ask a question, it's like 1563 00:58:34,800 --> 00:58:36,449 a speed dating and we only have one 1564 00:58:36,450 --> 00:58:37,650 minute, 10 seconds left. 1565 00:58:42,250 --> 00:58:44,889 I'll try, hopefully a quick answer, 1566 00:58:44,890 --> 00:58:47,469 can you contrast us with 1567 00:58:47,470 --> 00:58:50,319 deterrence and its use of land support? 1568 00:58:50,320 --> 00:58:52,569 Uh, deterrence is really 1569 00:58:52,570 --> 00:58:53,739 cool. 1570 00:58:53,740 --> 00:58:55,929 It's, uh, but 1571 00:58:55,930 --> 00:58:57,969 it's not really cross platform and not 1572 00:58:57,970 --> 00:59:00,339 cross architectures from the main point 1573 00:59:00,340 --> 00:59:01,869 to be very, very fast because we don't 1574 00:59:01,870 --> 00:59:02,979 have time. 1575 00:59:02,980 --> 00:59:04,299 The main point is like air. 1576 00:59:04,300 --> 00:59:06,189 You really have it's all about the 1577 00:59:06,190 --> 00:59:08,649 granularity and you have 1578 00:59:08,650 --> 00:59:10,599 the granularity is instructions. 1579 00:59:10,600 --> 00:59:12,819 And more than that, you can be before 1580 00:59:12,820 --> 00:59:15,099 or after every instruction. 1581 00:59:15,100 --> 00:59:17,349 And this only I can 1582 00:59:17,350 --> 00:59:19,629 provide you or if you have 1583 00:59:19,630 --> 00:59:21,129 like tons of time. 1584 00:59:21,130 --> 00:59:23,169 But yeah, this is really the main 1585 00:59:23,170 --> 00:59:25,119 difference. It's complementary tools, 1586 00:59:25,120 --> 00:59:26,120 basically. 1587 00:59:27,720 --> 00:59:29,819 I'm sorry, but you 1588 00:59:29,820 --> 00:59:31,919 can get them later and 1589 00:59:31,920 --> 00:59:33,629 the next hour would be 15 minutes. 1590 00:59:33,630 --> 00:59:35,999 It's growing up software development. 1591 00:59:36,000 --> 00:59:37,479 Thanks. Thank you very much. 1592 00:59:37,480 --> 00:59:38,480 Thank you.