1 00:00:00,330 --> 00:00:19,180 *35C3 preroll music* 2 00:00:19,180 --> 00:00:20,180 Herald Angel: Welcome everybody to this 3 00:00:20,180 --> 00:00:24,702 talk, "How does the Internet work?" And our speaker is Peter Stuge and I'm very 4 00:00:24,702 --> 00:00:29,010 happy that he is here to explain to all of us how the infrastructure of the Internet 5 00:00:29,010 --> 00:00:32,960 really works. I am pretty sure we will all learn a lot today. Please give a big and 6 00:00:32,960 --> 00:00:41,560 warm round of applause for Peter Stuge. *applause* 7 00:00:41,560 --> 00:00:46,300 Stuge: Thank you. Thank you very much. Thank you for being here. This is amazing. 8 00:00:46,300 --> 00:00:54,110 Translation into French, wow. So I want to talk about how the Internet works. And I 9 00:00:54,110 --> 00:01:00,660 try to, try to... yeah, try to shine some light on all the technologies that are 10 00:01:00,660 --> 00:01:07,270 involved when we use the Internet every day. So, why this talk? Some motivation 11 00:01:07,270 --> 00:01:11,870 first, then a little bit of brief background just how the Internet got 12 00:01:11,870 --> 00:01:17,631 started. And then we get into the details. So what actually happens between the web 13 00:01:17,631 --> 00:01:27,069 browser and the website, that's the starting point. So in the description I 14 00:01:27,069 --> 00:01:32,450 listed things from bottom up, so from the very low level packet stuff and through 15 00:01:32,450 --> 00:01:37,880 the various layers of the network stack up into the applications. And that's the 16 00:01:37,880 --> 00:01:44,380 building blocks part, but I inserted this overview first, what is actually going on 17 00:01:44,380 --> 00:01:49,450 between the browser and the website because that's what most people already 18 00:01:49,450 --> 00:01:59,170 know and use a lot. Some parts, well, some details about the different protocols and 19 00:01:59,170 --> 00:02:03,910 in the end some recommendations for further talks, if you find these topics 20 00:02:03,910 --> 00:02:11,860 interesting. So the reason I want to give this talk is to talk about how does the 21 00:02:11,860 --> 00:02:17,930 Internet work, right? The mechanisms that we use all the time... but aren't 22 00:02:17,930 --> 00:02:25,340 mentioned very much. So they are sort of obscured or... well I don't know if hidden 23 00:02:25,340 --> 00:02:31,710 is the right word but we wouldn't.. we don't experience the network itself very 24 00:02:31,710 --> 00:02:37,200 much, right? We experience the various services that we use and they, the 25 00:02:37,200 --> 00:02:45,470 services, they try their hardest to keep us interested, to tickle our imagination 26 00:02:45,470 --> 00:02:53,020 and... I think it's dangerous to not talk a little bit about the network every now 27 00:02:53,020 --> 00:02:59,850 and then. And to think about the network, and to actually fight for a public network 28 00:02:59,850 --> 00:03:08,120 that is available to all and equal. Also neutral. If we focus on the service 29 00:03:08,120 --> 00:03:14,400 providers alone then they're going to be deciding what we can do with the network. 30 00:03:14,400 --> 00:03:20,900 But the point of, or the great thing about, how the Internet is neutral today 31 00:03:20,900 --> 00:03:26,880 is that we're all connected or we could all connect to each other. We don't 32 00:03:26,880 --> 00:03:33,070 really have to use these service providers. We tend to. This is somehow a 33 00:03:33,070 --> 00:03:39,340 human nature to sort of go towards centralization and monopolization. But the 34 00:03:39,340 --> 00:03:47,630 Internet is a tool that would allow us to try more variants or other kinds of 35 00:03:47,630 --> 00:03:54,569 structures. And we need to be aware of that and the importance of net neutrality. 36 00:03:54,569 --> 00:04:03,950 If we don't talk a bit about the network we might lose it. So. How did it all get 37 00:04:03,950 --> 00:04:15,460 started? In 1970, then ARPA, they started the ARPANET. So ARPA back then is this is 38 00:04:15,460 --> 00:04:22,040 now DARPA. That's the Defense Advanced Research Projects Agency. They develop 39 00:04:22,040 --> 00:04:28,350 technology for the U.S. military. And they did back then as well. So the ARPANET was, 40 00:04:28,350 --> 00:04:33,930 as the quote says, from this very very old document, that the objective is to get all 41 00:04:33,930 --> 00:04:40,970 their suppliers connected into a network together, and being able to exchange 42 00:04:40,970 --> 00:04:45,910 information so that they can, I guess, make progress more quickly, more 43 00:04:45,910 --> 00:04:54,690 efficiently. Right? Now it's something else. I think that's good. So let's let's 44 00:04:54,690 --> 00:05:01,810 look at what happens between the browser and the web site. So we have a person 45 00:05:01,810 --> 00:05:07,580 using a laptop and they have a browser and they type in a web address. 46 00:05:07,580 --> 00:05:13,830 "events.ccc.de", for example. To read the blog posts, the latest blog posts about 47 00:05:13,830 --> 00:05:21,800 the Congress. So then the browser really does two different things first of all. To 48 00:05:21,800 --> 00:05:30,770 get, to show this page. So first of all it has to ask for the the way to reach this 49 00:05:30,770 --> 00:05:36,780 website that we want to reach. Computers, they don't deal very well with names or 50 00:05:36,780 --> 00:05:44,470 text. At least not the network part of computers or systems. So there is this 51 00:05:44,470 --> 00:05:50,740 translation somehow like a phonebook, I'll get back to that in a bit, called DNS, 52 00:05:50,740 --> 00:05:55,010 which is used primarily... it has a few other uses as well, but it's used 53 00:05:55,010 --> 00:06:00,200 primarily to get from this name that we entered, "events.ccc.de", that we can also 54 00:06:00,200 --> 00:06:08,430 somehow easily remember, to the network address. The IP address of this web site. 55 00:06:08,430 --> 00:06:13,900 So that's part one. And it says System DNS because the browser doesn't do all of 56 00:06:13,900 --> 00:06:20,900 this phone book lookup itself. It can rely on the operating system to take care of 57 00:06:20,900 --> 00:06:25,580 this. Unfortunately. So that's the parentheses that's what the operating 58 00:06:25,580 --> 00:06:29,710 system is doing its using a few protocols UDP, IP and that becomes a network packet. 59 00:06:29,710 --> 00:06:35,430 We'll get back to those in just a little bit. So once the browser has the the 60 00:06:35,430 --> 00:06:44,270 network address the IP address of this website it creates a connection. So it 61 00:06:44,270 --> 00:06:52,930 contacts the web server and it uses this set of protocols. So it first uses IP to 62 00:06:52,930 --> 00:06:57,400 reach the IP address of the server and in particular it uses TCP for this connection 63 00:06:57,400 --> 00:07:02,560 type, we'll get back to those in a little bit, as well what what their properties 64 00:07:02,560 --> 00:07:07,259 are. And on top of that the browser then uses the HTTP protocol. We'll see an 65 00:07:07,259 --> 00:07:15,900 example of that in the very end. How or to get to request the the web page that we 66 00:07:15,900 --> 00:07:23,100 wanted to see. And that's all happening on the laptop in the browser and part in 67 00:07:23,100 --> 00:07:27,610 the operating system that we're using whatever that might be. Then there's of 68 00:07:27,610 --> 00:07:32,810 course this long chain, or sometimes not so long, but usually several several 69 00:07:32,810 --> 00:07:37,850 machines along the way, routers, we might have a wireless router at home or in a 70 00:07:37,850 --> 00:07:45,770 coffee shop or here at Congress, and beyond that there is certainly there are 71 00:07:45,770 --> 00:07:52,259 certainly some more routers along the way from or between our laptop, my laptop, and 72 00:07:52,259 --> 00:07:57,410 the destination that I want to contact. So all of these routers they receive some 73 00:07:57,410 --> 00:08:03,370 packets, they look at the addresses, where it's going, in particular, and then sends 74 00:08:03,370 --> 00:08:11,169 it along its way. So they're they're just forwarding packets all all day long. 75 00:08:11,169 --> 00:08:19,580 Finally at the destination on the web server there are also two parts. So first 76 00:08:19,580 --> 00:08:28,060 of all the request that was sent by the browser is received. It goes through these 77 00:08:28,060 --> 00:08:32,229 these different layers, different protocols, and the website server 78 00:08:32,229 --> 00:08:37,159 software, it looks at the request and it says OK somebody wanted the first blog 79 00:08:37,159 --> 00:08:44,379 post, then I'll send that right back the same way that I received the request, and 80 00:08:44,379 --> 00:08:49,779 that's part two. So returning the response to this request, and it goes all the way 81 00:08:49,779 --> 00:08:57,980 through the routers the same path but in the reverse direction to the laptop. So 82 00:08:57,980 --> 00:09:06,230 let's let's look at all these different building blocks. All right. So let's start 83 00:09:06,230 --> 00:09:11,300 with the small smallest one in the network packet. I talked about packets going back 84 00:09:11,300 --> 00:09:19,619 and forth, so the packets, or a packet, is sort of the atom on the network--it's the 85 00:09:19,619 --> 00:09:28,189 smallest useful unit that is sent or processed by the network. I think a good 86 00:09:28,189 --> 00:09:34,490 way to explain packets is with regular postcards that you can send with mail, 87 00:09:34,490 --> 00:09:42,950 because their size, their maximum allowed size, is pretty much standardised. You 88 00:09:42,950 --> 00:09:46,939 can't send a postcard which is one meter, right? And that's the same with the 89 00:09:46,939 --> 00:09:52,439 network packets, you can't send arbitrarily large network packets. One 90 00:09:52,439 --> 00:10:00,459 pretty common maximum size is 1500 bytes or roughly characters. So just to give an 91 00:10:00,459 --> 00:10:07,380 idea of of how fairly small the packets are actually and even that might, I don't 92 00:10:07,380 --> 00:10:11,100 know do, 1500 characters fit on a postcard? No I guess not, I think that's 93 00:10:11,100 --> 00:10:14,759 too much. So maybe the packets are a little bit larger than postcards, but 94 00:10:14,759 --> 00:10:21,470 still the analogy is pretty good because you send them out and and there's very 95 00:10:21,470 --> 00:10:27,300 little, there's a little bit of structure, like there's a stamp perhaps, and a 96 00:10:27,300 --> 00:10:31,350 recipient address but that's pretty much it. So what you what you write on the 97 00:10:31,350 --> 00:10:35,970 postcard on the on the other side is really up to you and it's the same with 98 00:10:35,970 --> 00:10:40,550 the packets, they can contain anything, but if you write in the language that the 99 00:10:40,550 --> 00:10:44,989 receiver doesn't know, then they're going to receive the packet and then actually 100 00:10:44,989 --> 00:10:52,790 just drop it because they don't know what you're trying to tell them. So packets, 101 00:10:52,790 --> 00:10:59,999 they are sent and received through network interfaces. This is an Ethernet cable LAN 102 00:10:59,999 --> 00:11:08,309 port, or a Wi-Fi antenna, or maybe a 3G modem if you're on the go out and about 103 00:11:08,309 --> 00:11:13,089 and your cell phone does this of course as well right. The cell phone has Wi-F if 104 00:11:13,089 --> 00:11:20,480 you're in a coffee shop maybe, or it has 3G if you're in the subway or on the tram. 105 00:11:20,480 --> 00:11:28,339 And one interesting thing, or where the comparison to the postcards doesn't really 106 00:11:28,339 --> 00:11:35,029 fit anymore, is that network interfaces, they can easily pass millions of packets 107 00:11:35,029 --> 00:11:41,629 in a single second. So it can be can be quite a lot of information going through 108 00:11:41,629 --> 00:11:49,300 especially if you have a good internet connection like here at Congress. So then 109 00:11:49,300 --> 00:11:56,759 the next step or sort of if you start looking at OK what can we put on the 110 00:11:56,759 --> 00:12:01,249 information side of the postcard, right, where where we can put any message we 111 00:12:01,249 --> 00:12:07,440 want. For this talk I'm only going to focus on IP version 4. I know it's it's 112 00:12:07,440 --> 00:12:16,389 old and legacy and we really shouldn't be using it still but it is, it's dominant so 113 00:12:16,389 --> 00:12:24,179 far, it won't be forever, but so far it's quite common. And I think it's something 114 00:12:24,179 --> 00:12:31,040 that most of us have at least seen when setting up the Wi-Fi or the new Internet 115 00:12:31,040 --> 00:12:35,879 connection, right? This IP address that I put up on the slide is maybe the most 116 00:12:35,879 --> 00:12:43,740 common IP address there is, right, for the for the new wireless router. These IP 117 00:12:43,740 --> 00:12:49,799 addresses they consist of the four numbers and they are the four numbers. They range 118 00:12:49,799 --> 00:12:54,579 from 0 to 255 and then there's now four of them and with dots in between is just how 119 00:12:54,579 --> 00:13:04,050 we write them. This is an efficient way for machines to identify themselves. But, 120 00:13:04,050 --> 00:13:09,749 the reason IP version 4 isn't so great anymore is that it's quite a small number 121 00:13:09,749 --> 00:13:15,819 of addresses. So it turns out that the Internet is pretty popular and worldwide 122 00:13:15,819 --> 00:13:23,119 the addresses have run out or are running out. There aren't enough addresses for all 123 00:13:23,119 --> 00:13:27,049 the devices that are actually participating or somehow connected to the 124 00:13:27,049 --> 00:13:37,689 Internet. IPv6 will solve this. Let's see. Maybe maybe we'll live to experience that. 125 00:13:37,689 --> 00:13:42,829 So what is what is a network then? There are different kinds of networks. I've 126 00:13:42,829 --> 00:13:50,889 written physical networks and logical or abstract networks. Physical network is 127 00:13:50,889 --> 00:13:56,859 cabling, right? If you have some kind of connection from your Internet service 128 00:13:56,859 --> 00:14:04,959 provider that goes to your wireless router or if you have a LAN setup like in the 129 00:14:04,959 --> 00:14:10,430 hack center with a switch and lots of cables to each, one cable to each 130 00:14:10,430 --> 00:14:15,529 computer, that's a physical network and that's a tangible thing right. That's 131 00:14:15,529 --> 00:14:20,000 something we can we can touch and we can modify it with our hands and so on. But 132 00:14:20,000 --> 00:14:24,290 then there are also and that's that's certainly one network type and another 133 00:14:24,290 --> 00:14:29,309 equally valid network type is the logical network, or as I also call it the abstract 134 00:14:29,309 --> 00:14:37,170 network, which is defined only by the addresses used by some set of computers 135 00:14:37,170 --> 00:14:43,319 that are communicating together. So here's an example of an IP network that might be 136 00:14:43,319 --> 00:14:51,879 used with the wireless router and that has the IP address up on top right. And the 137 00:14:51,879 --> 00:15:00,489 there's sort of a pattern right. The first three digits are the same. And that's the 138 00:15:00,489 --> 00:15:09,480 network address. And the very last part is zero with this slash 24. Meaning that 24 139 00:15:09,480 --> 00:15:16,519 first bits of the 32. So now it's technical maths and binary and sorry, but 140 00:15:16,519 --> 00:15:23,399 essentially the 24 means the first three numbers are always the same. And within 141 00:15:23,399 --> 00:15:26,809 this logical network, so within this group of computers or systems that can 142 00:15:26,809 --> 00:15:34,249 communicate with each other, only the very last digit will change. And as long as 143 00:15:34,249 --> 00:15:41,579 this is the case we don't need a router, yet. We can--all these computers or all 144 00:15:41,579 --> 00:15:45,529 these systems--they can communicate directly with each other on the local 145 00:15:45,529 --> 00:15:51,029 network or on a Wi-Fi or or whatever. And the slash 24 (/24) and with the 146 00:15:51,029 --> 00:15:58,220 255.255.255.0 that's just two different ways to express exactly the same thing. So 147 00:15:58,220 --> 00:16:08,379 where do these IP addresses come from and how, who has them, and so on? So if we get 148 00:16:08,379 --> 00:16:14,910 a wireless router then we have some IP addresses. But me and my friend we both 149 00:16:14,910 --> 00:16:20,209 have the same perhaps IP addresses because we have a wireless router from the same 150 00:16:20,209 --> 00:16:26,089 supplier. Right. And this is a little bit of a special case. Those aren't Internet 151 00:16:26,089 --> 00:16:33,089 IP addresses. They're used only very locally. So only in one home network, only 152 00:16:33,089 --> 00:16:42,429 in one company network, perhaps. The public IP addresses are the ones that are 153 00:16:42,429 --> 00:16:47,359 on the outside of this wireless router that I got, and the wireless router 154 00:16:47,359 --> 00:16:54,100 typically only has one. Some Internet providers give you a few but it's very 155 00:16:54,100 --> 00:16:58,799 easy to have a lot more devices in your home or in your office than public IP 156 00:16:58,799 --> 00:17:04,329 addresses that you get from your Internet provider. So the IP addresses, they are 157 00:17:04,329 --> 00:17:11,020 assigned to the Internet providers, or the other way around, Internet providers they 158 00:17:11,020 --> 00:17:17,420 apply for some range of some number of IP addresses. And here in Europe there is an 159 00:17:17,420 --> 00:17:24,010 organisation called RIPE in charge of allocating a block of IP addresses to the 160 00:17:24,010 --> 00:17:31,320 Internet companies that are actively connecting to other Internet companies and 161 00:17:31,320 --> 00:17:39,880 maybe are also your Internet providers and mine. So and RIPE they have, they of 162 00:17:39,880 --> 00:17:47,680 course have colleagues in different parts of the world. So I think there are four or 163 00:17:47,680 --> 00:17:51,770 five, maybe even six of the RIR organizations, the regional network 164 00:17:51,770 --> 00:17:58,550 centers. They assign IP address blocks to the Internet companies, and by Internet 165 00:17:58,550 --> 00:18:04,320 company I don't only mean Internet providers that we use at home and at work, 166 00:18:04,320 --> 00:18:11,730 but also really any larger company that has a service available on the Internet. 167 00:18:11,730 --> 00:18:19,880 So all the streaming sites that you can imagine, all the, most, well several large 168 00:18:19,880 --> 00:18:26,320 just websites that are used every day will also have their own IP address range and 169 00:18:26,320 --> 00:18:32,700 will be active in finding different ways to connect to the Internet providers so 170 00:18:32,700 --> 00:18:40,350 that the end users can have as good an experience as possible when they're 171 00:18:40,350 --> 00:18:48,790 visiting there or using their services. So I talked about the Internet companies, 172 00:18:48,790 --> 00:18:55,080 they are trying to find good ways to connect to each other or to make it 173 00:18:55,080 --> 00:19:02,250 possible for users with one Internet company to reach either users at another 174 00:19:02,250 --> 00:19:10,130 Internet company or some service provided by some Internet company. And that's, 175 00:19:10,130 --> 00:19:16,540 that's the routing that's going on, both in the wireless wireless router at home 176 00:19:16,540 --> 00:19:23,030 but just as well and and even more so in all of these routers on the Internet that 177 00:19:23,030 --> 00:19:32,140 are handing packets back and forth. So starting with the wireless home router, it 178 00:19:32,140 --> 00:19:41,070 typically has one local network. At least. It might have more. So I had a home router 179 00:19:41,070 --> 00:19:48,100 that had both the regular Wi-Fi network and I was also able to configure a guest 180 00:19:48,100 --> 00:19:54,230 network or guest password. So that's actually two. It's Wi-Fi, so it's not 181 00:19:54,230 --> 00:19:58,170 really so intuitive, but those are two separate physical networks, because if 182 00:19:58,170 --> 00:20:03,420 you're connected to one you can't communicate directly with the other 183 00:20:03,420 --> 00:20:11,260 network without a router. Now there's some chance that the wireless router will do 184 00:20:11,260 --> 00:20:15,750 this, will enable this communication, but it's not for sure and it's not it's not 185 00:20:15,750 --> 00:20:22,120 certain. And in fact it's more likely that it won't work because this guest access, 186 00:20:22,120 --> 00:20:26,280 you're supposed to be able to give that to somebody who's just visiting and maybe you 187 00:20:26,280 --> 00:20:34,030 don't want them to access your printer or your storage cabinet or whatever. Right? 188 00:20:34,030 --> 00:20:39,050 So it's quite likely that this guest network doesn't get access to the main 189 00:20:39,050 --> 00:20:48,130 network. So two different networks, even though it's the same the same radio waves 190 00:20:48,130 --> 00:20:55,350 or the same air that's carrying the radio waves but the key property by the, or with 191 00:20:55,350 --> 00:21:01,540 a wireless or a home router, is that it almost always only has a single Internet 192 00:21:01,540 --> 00:21:07,580 connection, so it has a single connection to some Internet provider or in in the 193 00:21:07,580 --> 00:21:14,190 direction of the Internet. Typically that's that's the telco. But in some cases 194 00:21:14,190 --> 00:21:21,110 there's even, especially in the US, there's the situation where the telco or 195 00:21:21,110 --> 00:21:26,770 the Internet provider is also a content service provider. And that's a pretty bad 196 00:21:26,770 --> 00:21:34,440 situation. In particular if you have no options, no choice. So we have the home 197 00:21:34,440 --> 00:21:40,900 router with a single connection towards the Internet to the Internet provider. 198 00:21:40,900 --> 00:21:47,050 Let's compare that with the Internet routers that are further out on the 199 00:21:47,050 --> 00:21:54,010 Internet and operated by the many different Internet companies. They will 200 00:21:54,010 --> 00:22:01,460 similarly have one or more local networks that belong to them the same way that the 201 00:22:01,460 --> 00:22:08,300 wireless network belongs to the home router or wireless company, or sorry an 202 00:22:08,300 --> 00:22:12,190 Internet company or an internet organization, let's say like the CCC as 203 00:22:12,190 --> 00:22:18,730 well, it has some some equipment and servers with the events.ccc.de server for 204 00:22:18,730 --> 00:22:27,680 example, is part of the CCC slice of the Internet, and the router that's 205 00:22:27,680 --> 00:22:41,970 responsible for all of CCC's networks is responsible for. Also this IP segment 206 00:22:41,970 --> 00:22:46,970 where the web servers. Now the big difference here is that those Internet 207 00:22:46,970 --> 00:22:52,970 routers, or the routers that are further out on the Internet than our home routers, 208 00:22:52,970 --> 00:23:02,260 they typically connect to at least two but usually many more other Internet routers. 209 00:23:02,260 --> 00:23:07,990 Exactly how is it different in every location. There are some norms and some 210 00:23:07,990 --> 00:23:16,380 common topology is but this is... so the connections that exist are determined by 211 00:23:16,380 --> 00:23:23,930 by peering agreements between the Internet companies and their Internet organizations 212 00:23:23,930 --> 00:23:30,510 there. They can of course have agreements with whoever. So it's not so easy to tell 213 00:23:30,510 --> 00:23:37,360 them beforehand what a particular organization, how a particular 214 00:23:37,360 --> 00:23:41,330 organization, will do peering. This is an interesting topic. There are some more 215 00:23:41,330 --> 00:23:49,300 talks on this as well that I'm referring to later. One, at least one model, is to 216 00:23:49,300 --> 00:23:58,950 have a site. Some data center somewhere where an Internet exchange is running. So 217 00:23:58,950 --> 00:24:06,130 this is an organization whose sole purpose is to enable many different Internet 218 00:24:06,130 --> 00:24:10,630 companies or Internet organisations to somehow make their way there, put some 219 00:24:10,630 --> 00:24:17,850 cables to this data center, and all connect together and be able to exchange 220 00:24:17,850 --> 00:24:27,030 traffic between each other efficiently and maybe even at no cost. That's an 221 00:24:27,030 --> 00:24:37,140 interesting topic because there are so many different business models for the 222 00:24:37,140 --> 00:24:45,200 peering agreements. So the Internet exchanges is one model. There's a handful 223 00:24:45,200 --> 00:24:54,150 of them in Germany and that's about the scale of it. Private peering is of course 224 00:24:54,150 --> 00:24:59,220 possible to where organisations just have a direct connection between each other. 225 00:24:59,220 --> 00:25:06,630 And OK. So these connections they are then established somehow and how do the routers 226 00:25:06,630 --> 00:25:13,140 know where to send what? And that's a good question. This is managed by routing 227 00:25:13,140 --> 00:25:19,380 protocols, BGP is one. One such application or some, BIRD, is one 228 00:25:19,380 --> 00:25:25,510 application and then BGP is the protocol. So there are some rules, you can configure 229 00:25:25,510 --> 00:25:31,150 what to prefer, what route to prefer, but you can also just say I don't really care 230 00:25:31,150 --> 00:25:36,270 so much and just use whatever is available. And of course this depends on 231 00:25:36,270 --> 00:25:41,930 how much you have to pay for traffic that you send which way. If you have a really 232 00:25:41,930 --> 00:25:49,050 good peering agreement with another Internet organization and you're able to 233 00:25:49,050 --> 00:25:54,600 send a lot of traffic their way then without having to pay very much extra or 234 00:25:54,600 --> 00:25:57,230 maybe anything at all then of course you're going to try to send as much 235 00:25:57,230 --> 00:26:11,340 traffic as possible that way. All right, so now we're getting, we've looked at IP 236 00:26:11,340 --> 00:26:17,730 addresses and IP addresses... we know some systems on the Internet or connected to 237 00:26:17,730 --> 00:26:23,160 the Internet... all systems connected to the Internet, they have some IP address. 238 00:26:23,160 --> 00:26:33,340 And if we know the IP address we can try to reach that system.Yeah, yeah. That's a 239 00:26:33,340 --> 00:26:42,361 bit unfortunate. So. The first um the first bullet point is UDP. It's... now 240 00:26:42,361 --> 00:26:47,270 we're talking about, okay, so on the on the postcard when we're writing stuff 241 00:26:47,270 --> 00:26:53,760 there we we put the IP address because we know what system we want to reach. But we 242 00:26:53,760 --> 00:27:00,120 want to send it some kind of message as well. There are a few different ways to 243 00:27:00,120 --> 00:27:07,900 structure messages. And these are the most common ones, or the ones that make up 244 00:27:07,900 --> 00:27:14,400 almost all of the traffic on the Internet. So the first one is UDP. It's quite like 245 00:27:14,400 --> 00:27:21,830 postcards. So it's just a single message. There's no context, there's no connection 246 00:27:21,830 --> 00:27:27,740 between two different messages, and there's also no guarantees about how this 247 00:27:27,740 --> 00:27:33,430 message will, or this packet will, perform on the network. So if you send out a UDP 248 00:27:33,430 --> 00:27:39,860 packet it might arrive or it might not and you'll never know. And that can seem a bit 249 00:27:39,860 --> 00:27:46,540 useless but actually it's quite good in many cases. For example if you're doing 250 00:27:46,540 --> 00:27:55,730 real time audio or video streaming UDP, is a good choice because it's real time 251 00:27:55,730 --> 00:28:01,750 information, so if something is missing maybe there will be a glitch in the audio 252 00:28:01,750 --> 00:28:09,110 or there will be some glitch in the video, but it's not so important to wait and 253 00:28:09,110 --> 00:28:14,510 delay the image to fix that glitch. It's better to get the next image and just 254 00:28:14,510 --> 00:28:24,030 replace the image. So just keep on going. And for that UDP is a really good fit. 255 00:28:24,030 --> 00:28:28,710 Just send it send it along and if it arrives it arrives, most of the time it 256 00:28:28,710 --> 00:28:36,100 does arrive. Most of the time it works fine. So sometimes a good choice. The next 257 00:28:36,100 --> 00:28:44,340 point there is TCP. So maybe you've heard the term TCP/IP and TCP/IP is exactly 258 00:28:44,340 --> 00:28:50,360 the... so specifically it's the combination of this, this TCP then, I'll 259 00:28:50,360 --> 00:28:59,309 get into it in a second, with the IP address in both TCP and UDP. They have the 260 00:28:59,309 --> 00:29:05,130 concept of a port. So that's a second address. You could compare that with, 261 00:29:05,130 --> 00:29:13,550 let's say, the IP address is the street name and the port is the house number on 262 00:29:13,550 --> 00:29:18,430 that particular street. So it's a bit more precise. You know it's that system but 263 00:29:18,430 --> 00:29:27,050 that system might offer many services and you want one specific one. So for each of 264 00:29:27,050 --> 00:29:34,350 the common services that we use, email and web and Jabber and whatever, there are 265 00:29:34,350 --> 00:29:43,650 typical port numbers that are allocated and always the same. So that I don't have 266 00:29:43,650 --> 00:29:49,930 to guess or or look up what it is. So with TCP, what are the properties of that? 267 00:29:49,930 --> 00:30:03,120 That's more like a stream of letters that you have to go to the post office and 268 00:30:03,120 --> 00:30:11,490 acknowledge that you've received. So the recipient of a TCP packet or a network 269 00:30:11,490 --> 00:30:19,510 packet with IP and TCP inside of it will always confirm reception to the sender. So 270 00:30:19,510 --> 00:30:26,230 this allows this concept of a connection that I mentioned, where both sides talking 271 00:30:26,230 --> 00:30:36,470 to each other are synchronized and know where the other party is in this 272 00:30:36,470 --> 00:30:41,940 communication or in this connection. What data has been received and what has not 273 00:30:41,940 --> 00:30:51,790 yet been received. So the packets, TCP packets can of course also get lost, 274 00:30:51,790 --> 00:30:56,980 right? There's no guarantee with any network that it will always function 275 00:30:56,980 --> 00:31:01,510 correctly. You can just pull the cable and it will not be possible to send any 276 00:31:01,510 --> 00:31:11,250 packets.So TCP will recognize that. Oh, so I sent some packets out, but they haven't 277 00:31:11,250 --> 00:31:17,200 been confirmed, they haven't been acknowledged. OK. I'll try again. I'll 278 00:31:17,200 --> 00:31:25,559 send again a few times and it's usually adjustable how long TCP will be retrying 279 00:31:25,559 --> 00:31:29,710 to communicate. And finally it will give up and say yeah, sorry, it seems that this 280 00:31:29,710 --> 00:31:39,300 connection is broken. It's not possible to communicate anymore over this path. But if 281 00:31:39,300 --> 00:31:45,330 you're quick and you plug the cable back in then maybe everything will heal or the 282 00:31:45,330 --> 00:31:50,440 connection will just continue functioning just as if there was never an 283 00:31:50,440 --> 00:31:57,150 interruption, because the network software is just keeping track of what has been 284 00:31:57,150 --> 00:32:04,970 sent, what has been received, and can recover from this loss of communication. 285 00:32:04,970 --> 00:32:15,830 And the third one on the bottom is this SCTP. This is not quite so widespread but 286 00:32:15,830 --> 00:32:23,300 it's still a very powerful mix. It's a lot younger than the other two. So UDP and TCP 287 00:32:23,300 --> 00:32:35,150 they are... I'd like to say 70s and 80s. Yeah. So quite old. whereas SCTP I think 288 00:32:35,150 --> 00:32:39,820 the standard was final, or the first version of the standard came in 2000, so 289 00:32:39,820 --> 00:32:47,929 it's quite a lot younger, tis is protocol. But it's a powerful combination of 290 00:32:47,929 --> 00:32:57,010 properties from both the older ones so you can have... whereas TCP you just have a 291 00:32:57,010 --> 00:33:02,580 constant stream of text, essentially, or image or whatever content you are 292 00:33:02,580 --> 00:33:08,110 transferring... with UDP you had this message that's on the postcard, like 293 00:33:08,110 --> 00:33:13,590 that's one postcard that you're sending, that's the fixed fixed message. TCP 294 00:33:13,590 --> 00:33:16,720 doesn't have that concept, it's just information all the time until the 295 00:33:16,720 --> 00:33:24,790 connection closes. SCTP you can have a connection concept where both sides are 296 00:33:24,790 --> 00:33:31,679 aware of the communication status or the position and the communication, but you 297 00:33:31,679 --> 00:33:36,670 will be able to use it. You will still be able to send messages like on the 298 00:33:36,670 --> 00:33:42,440 postcards. So you have a fixed size piece of information that you want to transfer 299 00:33:42,440 --> 00:33:50,760 and you can send that as a unit, whereas if you're only using TCP, like we do on 300 00:33:50,760 --> 00:33:58,740 the web all the time, you have to build a lot of stuff around or on top of TCP in 301 00:33:58,740 --> 00:34:02,470 order to achieve the same thing. So if I want to transfer an image or when my 302 00:34:02,470 --> 00:34:07,220 browser wants to download an image, there's quite a lot of extra work that has 303 00:34:07,220 --> 00:34:15,629 to go into making that possible with the regular TCP protocol that is being used 304 00:34:15,629 --> 00:34:23,599 for now, so it would advantage SCTP certainly. It also has the retry, the 305 00:34:23,599 --> 00:34:30,539 reliable delivery, if you want to, and you can also use multi-homing. So that's not 306 00:34:30,539 --> 00:34:35,010 so common yet. As I said typically in the wireless home routers they only have the 307 00:34:35,010 --> 00:34:42,109 single Internet connection but that might change, we might in the future see several 308 00:34:42,109 --> 00:34:47,559 different kinds of Internet connections that we're using, and SCTP would be able 309 00:34:47,559 --> 00:34:53,760 to take advantage of that quite easily whereas the other ones cannot. So SCTP 310 00:34:53,760 --> 00:34:59,640 can send the same information over several different connections and whatever comes 311 00:34:59,640 --> 00:35:06,019 first arrives first at the destination and is accepted. This is of course a bit 312 00:35:06,019 --> 00:35:14,180 wasteful but in some cases maybe it's not a problem. So that's an exciting... I 313 00:35:14,180 --> 00:35:22,180 think exciting new feature. Let's see what the future brings. It seems that TCP is 314 00:35:22,180 --> 00:35:37,299 going away slowly but surely. Let's see what happens. But then some companies, 315 00:35:37,299 --> 00:35:45,599 they're providing systems where they want, they want to control much more of how the 316 00:35:45,599 --> 00:35:50,319 software is using the network, how the software is communicating on the network, 317 00:35:50,319 --> 00:35:56,339 and the way that these systems are built. Cell phones typically are smartphones. 318 00:35:56,339 --> 00:36:02,890 It's not so easy to do that with either TCP or SCTP, but it's quite easy to do it 319 00:36:02,890 --> 00:36:10,890 if they're using UDP, so I think that's a big motivator for them to try to move away 320 00:36:10,890 --> 00:36:22,491 from TCP and use UDP even more. Let's see. Sorry. So then we'll get in to some 321 00:36:22,491 --> 00:36:29,589 applications. Now we've written on the postcard, we've written addresses, IP 322 00:36:29,589 --> 00:36:35,869 addresses, the system that we want to communicate with, and we've chosen either 323 00:36:35,869 --> 00:36:44,299 UDP or TCP depending on what what is most suitable. Actually it depends typically on 324 00:36:44,299 --> 00:36:49,309 the application. So some applications require one or the other and a few 325 00:36:49,309 --> 00:36:58,400 applications can do either or. The first thing I'd like to mention here is DNS. I 326 00:36:58,400 --> 00:37:02,869 call it the phone book, the Internet phone book. But there is one big difference. A 327 00:37:02,869 --> 00:37:07,180 phone book is something we get from from one publisher, right? The phone company 328 00:37:07,180 --> 00:37:13,020 typically. And they, or the POC here at Congress, and they've just collected or 329 00:37:13,020 --> 00:37:17,990 they know all the phone numbers and they send us the list, right, with the names. 330 00:37:17,990 --> 00:37:26,599 DNS is different in that everybody who has who has a name in the DNS, in the domain 331 00:37:26,599 --> 00:37:32,150 name system, so anybody can register a domain name. And anybody who does that can 332 00:37:32,150 --> 00:37:38,960 can publish some information there. You can decide what you publish. Actually you 333 00:37:38,960 --> 00:37:43,780 can decide if you publish. So let's say you have a thousand IP addresses you can 334 00:37:43,780 --> 00:37:49,799 decide if you want to publish names for all of those thousand or if you just maybe 335 00:37:49,799 --> 00:37:55,849 publish a few of them that are going to be interesting for other people to use. And 336 00:37:55,849 --> 00:38:02,920 90 percent of them are just internal internal systems. So everybody gets to 337 00:38:02,920 --> 00:38:08,989 choose what they what they publish and everybody can publish. Also can run the 338 00:38:08,989 --> 00:38:13,890 infrastructure, storing this information on their own. So it's not that you have to 339 00:38:13,890 --> 00:38:19,931 send this in somewhere necessarily and they publish it for you. You can actually 340 00:38:19,931 --> 00:38:25,599 do that on your own. So it's decentralized. Very good. still it's super 341 00:38:25,599 --> 00:38:33,150 super old protocol, from from those days of from those early days of the internet. 342 00:38:33,150 --> 00:38:45,759 Nobody was thinking about security and nobody had done a lot of attacks on these 343 00:38:45,759 --> 00:38:51,200 protocols, Whether it it be reliability attacks or or just forgery attacks and so 344 00:38:51,200 --> 00:38:59,309 on. That wasn't a concerned because this was designed for companies working for 345 00:38:59,309 --> 00:39:03,700 the government. Right. So everybody was interested in collaborating and there were 346 00:39:03,700 --> 00:39:10,839 no bad actors. The Internet now is, again, quite different. So most of these these 347 00:39:10,839 --> 00:39:20,680 old protocols actually aren't so great anymore. The basic functionality of DNS or 348 00:39:20,680 --> 00:39:25,890 the phonebook is to publish IP addresses but you can publish other things as well. 349 00:39:25,890 --> 00:39:31,519 If you're interested in DNS there's a good talk about that later on. I mentioned it a 350 00:39:31,519 --> 00:39:38,180 bit. So the next application I want to talk about is SMTP or simple 351 00:39:38,180 --> 00:39:44,220 mail transfer protocol. This is what is used to deliver every single email in the 352 00:39:44,220 --> 00:39:53,880 world. All the time. All day long. Now one thing that's a bit interesting or quite 353 00:39:53,880 --> 00:40:03,289 interesting but also problematic, I'd say, about email and not SMTP per se but the 354 00:40:03,289 --> 00:40:09,829 scope of SMTP is, that SMTP is used only to send email. So SMTP doesn't have 355 00:40:09,829 --> 00:40:18,589 anything to do with receiving email. This means that there's a separate mechanism 356 00:40:18,589 --> 00:40:28,009 for receiving email. And the way these two these two different protocols or 357 00:40:28,009 --> 00:40:38,010 mechanisms work end up putting the cost of email with the person receiving mail. So I 358 00:40:38,010 --> 00:40:42,570 have to pay in order to be there with information or with money to get an email 359 00:40:42,570 --> 00:40:48,089 address where I have some some gigabytes of storage. Whereas people sending email 360 00:40:48,089 --> 00:40:51,700 they don't have to pay anything. They just need an internet access and then they can 361 00:40:51,700 --> 00:40:56,920 send all the emails they want, all day long, to every single possible address 362 00:40:56,920 --> 00:41:04,390 email address in the world. And that's why we have a spam problem on the Internet. 363 00:41:04,390 --> 00:41:13,299 Yeah that's a bug. Let's see if this can get fixed. Email is so tightly integrated 364 00:41:13,299 --> 00:41:22,599 into our everyday lives that ... I'm not sure. But let's see. That would be great. 365 00:41:22,599 --> 00:41:30,530 So the last application protocol I want to mention is the HTTP hypertext transfer 366 00:41:30,530 --> 00:41:38,230 protocol that's used for web. You recognize it from the web browser URLs. 367 00:41:38,230 --> 00:41:44,430 Webpage used to be just hypertext so text with some links. That's all they could do 368 00:41:44,430 --> 00:41:54,789 in the very beginning. I'd like to show an example of SMTP. Actually I have to do 369 00:41:54,789 --> 00:42:17,960 something about this, because it's not so not so easy to read. Let's see... I 370 00:42:17,960 --> 00:42:25,720 should've done that already. Sorry about that. Um so this is an example of an email 371 00:42:25,720 --> 00:42:39,729 delivery. This is all it takes to send an email on the Internet. The arrow pointing 372 00:42:39,729 --> 00:42:47,170 left is received from the email server, from the SMTP server. And the arrow 373 00:42:47,170 --> 00:42:51,769 pointing this way [right] is what we send to the email server when we want to send 374 00:42:51,769 --> 00:42:59,660 an email. So if we connect to an email server, for example mine, it will send us 375 00:42:59,660 --> 00:43:08,930 some text. We are using TCP and we're using port 25 for SMTP. So we get a stream 376 00:43:08,930 --> 00:43:17,569 of text going back and forth. The the server tells us 220 and its name, that's 377 00:43:17,569 --> 00:43:24,740 some kind of welcome code. We say HELO my name is laptop. Because I'm doing this 378 00:43:24,740 --> 00:43:30,880 from my laptop. The the mail server says OK, good to meet you. And then we say I 379 00:43:30,880 --> 00:43:40,819 want to send an email where the sender address is test@stuge.se. And if you're 380 00:43:40,819 --> 00:43:48,489 paying attention here the sender of the e-mail gets to say what the sender address 381 00:43:48,489 --> 00:43:55,430 is. So this is why it's super easy for anyone to forge email from any sender 382 00:43:55,430 --> 00:44:10,029 address. It's just part of the message. The server accepts the sender, even though 383 00:44:10,029 --> 00:44:15,249 the sender might not even exist. I tell it the recipient (RCPT). This is for me, 384 00:44:15,249 --> 00:44:23,190 meant for me. The server says OK. Then I say here's the DATA for this email and the 385 00:44:23,190 --> 00:44:29,370 server says: go on, start sending me the contents. And then I send send an email 386 00:44:29,370 --> 00:44:39,759 where the the sender is Trollolol and just some fake sender address, whatever subject 387 00:44:39,759 --> 00:44:49,700 and some text. And in the end I finish with a dot to say ok, end of message. The 388 00:44:49,700 --> 00:44:54,190 server says OK. And then I say to the server I want to QUIT not, I don't want to 389 00:44:54,190 --> 00:45:00,720 talk to you anymore. The server says "closing" goodbye. And this is e-mail on 390 00:45:00,720 --> 00:45:14,210 the network. Last example: a web page, access over HTTP. So this is even even 391 00:45:14,210 --> 00:45:19,780 simpler. I've simplified this even a little bit more. If you want to try this 392 00:45:19,780 --> 00:45:29,410 yourself, please do. So HTTP is also TCP and port 80. I tried talking to the 393 00:45:29,410 --> 00:45:36,509 events.ccc.de web server. Same thing here: Arrows pointing this way [right] is what 394 00:45:36,509 --> 00:45:45,640 we send when we contact the server. So. Connection opens. I send "GET / HTTP1.0" 395 00:45:45,640 --> 00:45:55,489 because I want to get the main page and I'm saying I'm speaking HTTP version 1.0. 396 00:45:55,489 --> 00:46:00,140 And then I tell it OK I want to access this start page on the hostname 397 00:46:00,140 --> 00:46:08,880 events.ccc.de. Then I send it an empty line. That's to say OK this is my request. 398 00:46:08,880 --> 00:46:14,579 And then there comes the response (arrows going in the other direction [left]) where 399 00:46:14,579 --> 00:46:20,710 the web server says what you're asking for is not available here where you're asking 400 00:46:20,710 --> 00:46:26,190 for it. You have to go somewhere else. It's a redirect. The 301 is the HTTP code 401 00:46:26,190 --> 00:46:32,269 for redirect. And this content that you're asking for, this page, it's been 402 00:46:32,269 --> 00:46:42,829 moved permanently. The new location is https://events.ccc.de. So I was using an 403 00:46:42,829 --> 00:46:53,150 IP and TCP connection with no encryption. And that's why I can just type in the GET 404 00:46:53,150 --> 00:46:57,720 and the "Host:" line. But the web server tells me I'm sorry I don't want to talk to 405 00:46:57,720 --> 00:47:07,950 you without encryption. So you have to go to this HTTPS address instead. Thank you 406 00:47:07,950 --> 00:47:14,779 events.ccc.de! I like encryption. That's good. And thank you also to all the 407 00:47:14,779 --> 00:47:21,359 angels that make Congress possible because without them and without you, who are 408 00:47:21,359 --> 00:47:25,990 here, who are angels, there wouldn't be any Congress. And also I want to say a 409 00:47:25,990 --> 00:47:33,210 huge "thank you" to you in the audience, for being curious and wanting to learn 410 00:47:33,210 --> 00:47:35,649 something new. 411 00:47:35,649 --> 00:47:44,839 applause 412 00:47:44,839 --> 00:47:47,559 Herold Angel: Thank you very much Peter. Now we have some time 413 00:47:47,559 --> 00:47:51,980 left for Q and A so if you have questions please do line up at the microphones that 414 00:47:51,980 --> 00:47:57,359 you find here. If you want to ask anything. Do we have a question from the 415 00:47:57,359 --> 00:48:01,819 Internet. No. The Internet is out of questions. Do I see anybody standing at 416 00:48:01,819 --> 00:48:14,099 any microphone? Please make yourself known if I overlook you. Any questions? Oh, at 417 00:48:14,099 --> 00:48:19,019 microphone five. Please do ask your question. 418 00:48:19,019 --> 00:48:27,469 Question: You mentioned that you think SMTP has a kind of bug, in the sense that, 419 00:48:27,469 --> 00:48:31,469 you can just send an e-mail and the responsibility is on the side of the 420 00:48:31,469 --> 00:48:37,089 receiver. So if you call it a bug it seems you have an easy solution. 421 00:48:37,089 --> 00:48:42,599 Answer: I'm sorry, but, no, I don't. I mean I wish! That that would be great! 422 00:48:42,599 --> 00:48:49,609 It's not so easy to fix because it is a property of SMTP, right, and of the e-mail 423 00:48:49,609 --> 00:48:54,390 system that we're using. So there was a proposal a long long time ago, by somebody 424 00:48:54,390 --> 00:48:59,940 much smarter than me, called "Internet Mail 2000" where actually the whole thing 425 00:48:59,940 --> 00:49:05,799 is switched around, so that the sender has to store the message, and the receiver can 426 00:49:05,799 --> 00:49:13,719 go and pick it up. So there, the cost is is is placed on the sender. And I think 427 00:49:13,719 --> 00:49:19,289 that would go a long way to solving the spam problem. But it's not compatible with 428 00:49:19,289 --> 00:49:25,059 the e-mail software that we have today. So it's not clear to me, how we would be able 429 00:49:25,059 --> 00:49:33,230 to migrate in a good way, unfortunately. 430 00:49:33,230 --> 00:49:38,024 Harald Angel: Thank You. Do we have any other questions? That does not seem to be the 431 00:49:38,024 --> 00:49:41,619 case, so please give another warm round of applause to Peter Stuge. Thank you very 432 00:49:41,619 --> 00:49:44,086 much for the talk. Peter: Thank you. 433 00:49:44,086 --> 00:49:46,472 *applause* 434 00:49:46,472 --> 00:49:51,947 *postroll music* 435 00:49:51,947 --> 00:50:09,000 subtitles created by c3subtitles.de in the year 2018. Join, and help us!