1
00:00:03,974 --> 00:00:08,515
Thomas Fricke: Thank you very much for the
invitation. So second talk tomorrow –

2
00:00:08,515 --> 00:00:13,731
Thank you – ähm today. So this is my
background. More or less I do, Kubernetes

3
00:00:13,731 --> 00:00:19,086
security and critical infrastructure,
founded several companies and are now my

4
00:00:19,086 --> 00:00:25,392
main focus is on Kubernetes security. This
this rabbit hole of Kubernetes, if you

5
00:00:25,392 --> 00:00:31,267
look deeper into it, then you should be a
little bit scared, and I want to explain

6
00:00:31,267 --> 00:00:36,764
why. The first approach is the
application, and then the application

7
00:00:36,764 --> 00:00:42,976
normally is run in containers. And the
containers, what is not really well known,

8
00:00:42,976 --> 00:00:48,458
have access to service accounts in
Kubernetes, which is one of the major

9
00:00:48,458 --> 00:00:54,474
flaws in Kubernetes at the moment. If you
take over the service account, it might be

10
00:00:54,474 --> 00:01:01,088
that you can take over a cluster. and if
you can take over a cluster, you might

11
00:01:01,088 --> 00:01:08,156
take over a node and then your entire
cloud service account, Which is the work

12
00:01:08,156 --> 00:01:15,121
of somebody else, I will mention later on
these slides. So let's look what happens:

13
00:01:15,121 --> 00:01:21,952
So, the target is I have an application
exposed to the internet, and I want to own

14
00:01:21,952 --> 00:01:28,565
the entire cluster from outside.
Application might be vulnerable. Examples?

15
00:01:28,565 --> 00:01:36,538
Yeah, lots of them. One example I want to
present is imagetragick, who normally

16
00:01:36,538 --> 00:01:43,876
should not do eval or exec statements in
any framework – should be PHP, NodeJS, or

17
00:01:43,876 --> 00:01:50,644
any other framework – and execute commands
in the context of your application,

18
00:01:50,644 --> 00:01:56,311
because something can go wrong and
developers are responsible for this. Let's

19
00:01:56,311 --> 00:02:04,604
see how it looks like: This is the attack
model based on an attack. I thought it was

20
00:02:04,604 --> 00:02:13,471
old and has been fixed in 2016, but now
there was a new overview by Emil Lerner,

21
00:02:13,471 --> 00:02:22,343
who again showed, yes, you can, in current
versions of ImageMagick, exploit this

22
00:02:22,343 --> 00:02:32,640
attack. So, it works. ImageMagick is for
uploading images, so you convert the image

23
00:02:32,640 --> 00:02:41,163
in a different format, scale the size, and
then if you do something wrong in this

24
00:02:41,163 --> 00:02:46,554
image, you can own the entire container.
This also works for non containerized

25
00:02:46,554 --> 00:02:52,520
applications if you have a server running
something with ImageMagick on it. Please

26
00:02:52,520 --> 00:03:02,160
be careful. OK. If we have mastered this
step, the next step is, yes, we want

27
00:03:02,160 --> 00:03:08,167
access to the service account. And this is
by default enabled in Kubernetes: So, you

28
00:03:08,167 --> 00:03:14,026
have a Kubernetes design flaw because your
service account is exposed to the

29
00:03:14,026 --> 00:03:20,040
container where the application runs it.
The next step of an attacker is

30
00:03:20,040 --> 00:03:27,229
installation of additional software. So,
you want to take over. You need a curl or

31
00:03:27,229 --> 00:03:35,386
kubectl or chmod, and then you are owner
of the service account and can actually do

32
00:03:35,386 --> 00:03:42,897
commands by uploading pictures in
ImageTragick. So responsible for this flaw

33
00:03:42,897 --> 00:03:50,920
is the image creator. Let's see what else
can happen. To get total control, you also

34
00:03:50,920 --> 00:03:57,514
need role-binding to a cluster-admin role.
This is not enabled by default, but the

35
00:03:57,514 --> 00:04:04,067
internet is always good for bad advice. So
if you copy the installation requirements

36
00:04:04,067 --> 00:04:11,224
or recommendations from the internet,
somebody else might take over the entire

37
00:04:11,224 --> 00:04:23,406
cluster. Let's look deeper into it: Worst
practice here is what you can see in the

38
00:04:23,406 --> 00:04:34,320
elastic installation recommendation: They
just mentioned they have a newer version,

39
00:04:34,320 --> 00:04:43,200
but they use the cluster admin permissions
here to install ElasticSearch in your

40
00:04:43,200 --> 00:04:53,680
Kubernetes cluster. So they recommend it
and a lot of other applications also have

41
00:04:53,680 --> 00:04:59,760
this – which is a little bit outdated, but
it's quite common – in the installation

42
00:05:00,480 --> 00:05:08,000
requirements. Never, ever do this, please.
It also can come with Helm Charts, so you

43
00:05:08,000 --> 00:05:16,560
have Helm Charts where the cluster-admin
role is included. Here you see it, it was

44
00:05:16,560 --> 00:05:23,200
in Apache Heron, which is an Apache
project, and it uses the cluster-admin

45
00:05:23,200 --> 00:05:40,000
role, so by a helm install you might be
affected by this flaw to. So with these

46
00:05:40,000 --> 00:05:46,800
four steps, which effectively are three
steps, you have a cluster application

47
00:05:46,800 --> 00:05:53,680
exposed, and through that path, you can
take over the entire cluster from the

48
00:05:53,680 --> 00:06:01,600
outside, and do anything what the cluster-
admin world can do. Effectively, is this

49
00:06:02,400 --> 00:06:09,680
cluster-admin role-binding is like a
doormat attack, so you have the best

50
00:06:09,680 --> 00:06:17,760
cryptography, the most expensive locks on
one side and then you put the lock under

51
00:06:17,760 --> 00:06:24,480
the doormat or under the flower at the
door or something like that. This is

52
00:06:24,480 --> 00:06:34,240
something which is, not really, what you
want. I can do an example walkthrough

53
00:06:34,240 --> 00:06:40,270
which shows how it goes. So, I've
published all my trainings notebooks on

54
00:06:40,270 --> 00:06:50,415
GitHub. Here's the way you can build this
out-dated ImageTragick version in

55
00:06:50,415 --> 00:06:57,834
OpenShift. So, I use CRC, which is the
code-ready container version. It's based

56
00:06:57,834 --> 00:07:05,134
on the ImageTragick proof of concept by
Mike Williams. And here you run and create

57
00:07:05,134 --> 00:07:13,582
a vulnerable image. A little bit lengthy.
It's compiled inside and so on. So, don't

58
00:07:13,582 --> 00:07:19,890
get a full Version. Which is the reason
why I don't show it here, but effectively

59
00:07:19,890 --> 00:07:29,800
at the end, you have a vulnerable
application in a container internal and in

60
00:07:29,800 --> 00:07:37,990
OpenShift. And that's exactly what we need
to run the application. Here is the

61
00:07:37,990 --> 00:07:48,560
exploit. And the exploit starts with the
deployment of this container, which is

62
00:07:48,560 --> 00:07:54,000
standard Kubernetes. Here "oc" is like
kubesctl. So, you get an overview.

63
00:07:54,720 --> 00:07:59,680
Additionally, in OpenShift, you have a
very simple version of creating a root,

64
00:08:00,320 --> 00:08:07,200
which is connected to a hostname, and then
you can upload it by using that hostname.

65
00:08:09,040 --> 00:08:16,230
You expose the deployment, you expose the
service which is created, you expose the

66
00:08:16,230 --> 00:08:22,920
route finally, and then you have access.
The next step is you get this root and

67
00:08:22,920 --> 00:08:30,814
then here you have a URL, which you can
use. And in a full demo, I would just

68
00:08:30,814 --> 00:08:38,063
simply call this URL and then I can upload
images here. I've created these files,

69
00:08:38,063 --> 00:08:45,000
which are valid postscript files, but you
see at the end there is a full command.

70
00:08:45,000 --> 00:08:50,957
And here, because there's a curl in the
container, I can download a version of

71
00:08:50,957 --> 00:08:58,374
kubectl. Effectively, the containers,
specially the RedHat containers are not so

72
00:08:58,374 --> 00:09:07,578
vulnerable as others, but you have always
writable temp, which is enough to deploy

73
00:09:07,578 --> 00:09:16,724
some software. So, we curl kubectl from
the internet, put it into temp, and then

74
00:09:16,724 --> 00:09:26,042
we use a simple chmod command to activate
kubectl. So now we can call kubectl

75
00:09:26,042 --> 00:09:39,680
commands from inside an image. It's a
death bells, more or less so. Exactly at

76
00:09:39,680 --> 00:09:46,400
the right place. We have a working exploit
now and warning, it might also already

77
00:09:46,400 --> 00:09:52,640
work in older versions of Kubernetes.
Because in newer versions will need some.

78
00:09:54,320 --> 00:10:01,600
Pill of poison, additionally, and this is
exactly this cluster all binding to the

79
00:10:01,600 --> 00:10:06,880
cluster admin, which needs to be done,
that we have full access from the outside

80
00:10:06,880 --> 00:10:15,840
and if we do this, and expose our cluster
admin account to the same account, which

81
00:10:15,840 --> 00:10:21,760
is already exposed inside the container,
we can execute commands with this kubectl

82
00:10:21,760 --> 00:10:29,200
so we can create deployments by uploading
pictures. Which is exactly what you never

83
00:10:29,200 --> 00:10:34,800
want, but an attacker now has full access
to your cluster by simply uploading

84
00:10:35,920 --> 00:10:46,178
prepared malicious pictures. Can do this.
So this is an example here, just. Create

85
00:10:46,178 --> 00:10:50,732
and delete. Containers and deployments
this way, you can effectively do

86
00:10:50,732 --> 00:11:10,560
everything. And again, this is the problem
here from the application side. If you

87
00:11:10,560 --> 00:11:20,560
have a vulnerable version of ImageMagick,
you can include commands, and you can

88
00:11:20,560 --> 00:11:27,440
definitely install software on the
Kubernetes server side. There are several

89
00:11:27,440 --> 00:11:34,800
trys to fix this. For example, you can use
better images like Red Hat does, so this

90
00:11:34,800 --> 00:11:39,760
is a Red Hat health index, which is quite
good, but effectively these images have

91
00:11:40,320 --> 00:11:48,480
the advantage only that you not run
anything as root. But you run the same as

92
00:11:49,440 --> 00:11:54,960
another user I.D. and it's the same user
is allowed to write to the temp directory,

93
00:11:54,960 --> 00:12:04,080
effectively, yeah, you don't need root for
installing software. So, the container

94
00:12:04,080 --> 00:12:12,000
also was good practice, no root inside, it
has an immutable root file system, but the

95
00:12:12,000 --> 00:12:16,960
curl which is completely unnecessary, was
also deployed, we had write access to

96
00:12:16,960 --> 00:12:23,200
temp. We had a chmod. And the first thing
you would prevent. All the stuff I'm doing

97
00:12:23,200 --> 00:12:28,800
here is and if you're going to and don't
learn anything from this talk, please go.

98
00:12:30,320 --> 00:12:36,000
Look into your service account and try to
disable the automountServiceAccountToken

99
00:12:36,720 --> 00:12:41,520
features, so all of the service accounts
which are not running operators don't need

100
00:12:42,160 --> 00:12:49,200
this service account open. If you have an
operator, it might be broken now and it

101
00:12:49,200 --> 00:12:58,480
can be, um, overwritten by the Pod
definition, but effectively this. entire

102
00:12:58,480 --> 00:13:04,560
example would not work without this
service account token. So, we have fixed

103
00:13:04,560 --> 00:13:10,240
that. We cannot fix the application
because this is something, uh, somebody

104
00:13:10,240 --> 00:13:14,960
else is creating for us, and we might even
have a floor which is not affected, so

105
00:13:14,960 --> 00:13:19,840
there might be a zero-day. The next thing
we must prevent is the installation of

106
00:13:19,840 --> 00:13:29,760
software. Fix the images, so use really
immutable images. Temp only if you need

107
00:13:29,760 --> 00:13:39,760
it. PID is 1, anyway. Uh, OK, you might
have some variable data, but you should

108
00:13:39,760 --> 00:13:47,920
use containers from scratch, no curl, no
wget and this also affects Red Hat UBIs

109
00:13:47,920 --> 00:13:52,880
And most of the standard images have this
flaw, so you have a full operating system

110
00:13:52,880 --> 00:14:02,320
inside with all the tools you like. But
this is not your territory. It's just,

111
00:14:02,320 --> 00:14:08,240
yeah, it's a tool for the attacker. So
please run only trusted images, build your

112
00:14:08,240 --> 00:14:16,560
own images and build them from scratch.
This is my example I also have uploaded to

113
00:14:16,560 --> 00:14:22,800
GitHub, how to harden the container, which
is based on nginx alpine. nginx alpine

114
00:14:22,800 --> 00:14:27,520
normally is a very small container, but
you can do more. You can use the script,

115
00:14:28,400 --> 00:14:33,920
which is in this repository, just to get
only the tools you need. So this is not

116
00:14:33,920 --> 00:14:39,557
statically linked because the original
engine is not statically linked. But it's

117
00:14:39,557 --> 00:14:55,905
very close. This means you only positively
install the software you need. This is

118
00:14:55,905 --> 00:15:02,048
dynamically linked, therefor the -d, so we
use LVD. Extract all the dynamic link

119
00:15:02,048 --> 00:15:09,735
libraries and then all the configuration
files which are necessary. It is the

120
00:15:09,765 --> 00:15:17,743
password registry group. OK. Some licenses
and share. Need some directories for

121
00:15:17,743 --> 00:15:23,154
logging and then you can install it from
scratch because this script installs it in

122
00:15:23,154 --> 00:15:32,397
a directory \temp\harden and you can with
this. Multi-stage build you can install

123
00:15:32,397 --> 00:15:41,578
all what you need from \temp\harden. And
then the next container is based on

124
00:15:41,578 --> 00:15:48,908
scratch and you can use nginx the same way
you would use it more or less. An

125
00:15:48,908 --> 00:15:57,423
application which is statically linked. So
now we have created a hardened image

126
00:15:57,423 --> 00:16:04,328
without kubectl, curl inside. So, we are
much closer to a secure application. The

127
00:16:04,328 --> 00:16:10,699
next thing is, yeah, role binding to
cluster admin role. Don't do this. If

128
00:16:10,699 --> 00:16:17,954
something in your application goes wrong,
you have additional measures, which you

129
00:16:17,954 --> 00:16:24,675
can take just to prevent the application
from break-out of the container. So, you

130
00:16:24,675 --> 00:16:30,432
can separate the internet exposure of
services or ingresses in Kubernetes from

131
00:16:30,432 --> 00:16:36,640
privilege operations. So you have node
settings. ElasticSearch is doing a lot of

132
00:16:36,640 --> 00:16:44,200
these things, so a lot is really not true
so, doing a sysctl. Some applications have

133
00:16:44,200 --> 00:16:51,851
hostPaths on or have connection to the
host inter-process communication, which is

134
00:16:51,851 --> 00:16:58,477
not necessary if you have exposed it and
then separate the applications who need

135
00:16:58,477 --> 00:17:04,135
this from the applications which don't
need it. So, cluster admin should be more

136
00:17:04,135 --> 00:17:09,463
or less restricted to very privileged
operators. And by the way, Argo is also a

137
00:17:09,463 --> 00:17:14,852
very privileged operator. Don't run an
Argo on a Kubernetes cluster in a security

138
00:17:14,852 --> 00:17:21,193
critical environment because I've seen
Argo also is binding to cluster admin. It

139
00:17:21,193 --> 00:17:26,798
doesn't mean that Argo by default is
unsafe, but it's a very complex

140
00:17:26,798 --> 00:17:33,800
application and I would definitely run it
in a separate cluster, not in the critical

141
00:17:33,800 --> 00:17:40,958
cluster. And what does an architecture fix
look like, here you have the lifecycle of

142
00:17:40,958 --> 00:17:47,536
a Pod, so the time is going to from left
to right. Here you see if the container is

143
00:17:47,536 --> 00:17:55,931
ready, it can be accessed from the
internet. And if you do something from the

144
00:17:55,931 --> 00:18:04,064
init system, like a sysctl, please do it
inside a container which is not connected

145
00:18:04,064 --> 00:18:09,722
to the internet, just to use the pause
container, as a pause container to limit

146
00:18:09,722 --> 00:18:15,721
it and restrict it and that is not really
connected to the network. So, this is

147
00:18:15,721 --> 00:18:22,786
something which covers the architecture.
Additionally, I already mentioned here the

148
00:18:22,786 --> 00:18:28,357
network policy which will come later, so
this is our threat matrix. We have exposed

149
00:18:28,357 --> 00:18:32,908
and not exposed services. You have
unprivileged and privileged things. The

150
00:18:32,908 --> 00:18:39,385
dangerous ones are the privileged ones
which are exposed, but normally you only

151
00:18:39,385 --> 00:18:46,295
have an exposed privileged application if
you have an IDE running in Kubernetes,

152
00:18:46,295 --> 00:18:50,502
which is not what I would like to see in
critical infrastructure, something like

153
00:18:50,502 --> 00:18:57,680
rstudio or have a web ui to a gitops
framework. And normally you only have a

154
00:18:57,680 --> 00:19:03,682
web application. And what should not be
exposed under normal conditions is an

155
00:19:03,682 --> 00:19:11,657
operator's sysctl, build systems, host
operators and so on. If you do this, it's

156
00:19:11,657 --> 00:19:20,702
virtually not possible to own the cluster,
you should do all the three because if you

157
00:19:20,702 --> 00:19:26,480
have security in depth, you can make a
mistake on one of these levels and the

158
00:19:26,480 --> 00:19:32,149
other means other levels keep you from
being exploited. You can even do more

159
00:19:32,149 --> 00:19:38,640
isolation on the network side, you have
network policies for egress on the node

160
00:19:38,640 --> 00:19:44,482
side, you can activate seccomp, gvisor,
and the common Frameworks, SELinux,

161
00:19:44,482 --> 00:19:49,646
Apparmor. You can use PodSecurity
policies, or in the future, the open

162
00:19:49,646 --> 00:19:56,070
policy agent to prevent the node from
being hacked. For the identity and access

163
00:19:56,070 --> 00:20:02,801
management, you should use individual
service accounts for all your tasks. So

164
00:20:02,801 --> 00:20:08,557
you have enough of a lot of roles. You
should use role based access control to

165
00:20:08,557 --> 00:20:18,300
check this. OK, but I promise, yes, we can
go even deeper, and this needs a little

166
00:20:18,300 --> 00:20:27,559
help from your cloud administrator and
here, the example from Nico Meisenzahl,

167
00:20:27,559 --> 00:20:34,867
who does a very similar example on hi-
jacking Kubernetes, and he's doing it,

168
00:20:34,867 --> 00:20:42,982
obviously in one of the clouds. And what
he has found out is you can get access to

169
00:20:42,982 --> 00:20:49,258
the azure.json file, which has user
assigned identities. This is not the

170
00:20:49,258 --> 00:20:55,186
Kubernetes identities. This is the Azure
identity. You can get a token, you can get

171
00:20:55,186 --> 00:21:01,670
a subscription, you can get a resource
group and then you can use a curl command,

172
00:21:01,670 --> 00:21:06,928
with this token, to change things on the
API version of this resource group with

173
00:21:06,928 --> 00:21:11,820
this subscription. So, you might be able
to hack your node with the privilege

174
00:21:11,820 --> 00:21:17,403
container and then take over your cloud
account. And he told me that this is also

175
00:21:17,403 --> 00:21:25,415
the truth for the other cloud, so it might
even work something similar in AWS and

176
00:21:25,415 --> 00:21:33,045
GCP. So please, also protect your cloud
account. Understand your identity and

177
00:21:33,045 --> 00:21:38,200
access management in the cloud. So, at
least, someone in the team should

178
00:21:38,200 --> 00:21:43,837
understand it. And limit also the
underlying account to the bare minimum. It

179
00:21:43,837 --> 00:21:50,515
might even be a good idea to block access
addresses like 169.254.something. And the

180
00:21:50,515 --> 00:21:57,200
other clouds, as I already mentioned, also
might be affected. And my call to the

181
00:21:57,200 --> 00:22:03,451
cloud providers, is don't deliver account
data in containers or nodes. This is not

182
00:22:03,451 --> 00:22:07,571
necessary. It's yes, it's very
comfortable, as the service account

183
00:22:07,571 --> 00:22:13,499
talking is very comfortable for running
operators, but it's a major security flaw

184
00:22:13,499 --> 00:22:22,640
and it might be that you lose all your
accounts and data. Conclusion: We have a

185
00:22:22,640 --> 00:22:29,310
full attach chain from the application to
the cloud account. And it's your task to

186
00:22:29,310 --> 00:22:36,005
prevent it and fix it. This is called
shared responsibility, so the cloud

187
00:22:36,005 --> 00:22:40,631
providers effectively only care for the
infrastructure, but not really for the

188
00:22:40,631 --> 00:22:46,037
security in your clou d. This is your
task. OK. Thank you for your attention, I

189
00:22:46,037 --> 00:22:52,109
hope it was interesting. Please ask your
questions. And now I'm open for he Q&A.

190
00:22:52,109 --> 00:22:58,369
*Applaus*

191
00:22:58,369 --> 00:23:04,720
Herald: Thank you for the talk. This is
working? Yeah. So do we have any questions

192
00:23:04,720 --> 00:23:12,480
from the internet? I don't see any coming
in so far, but we, I think, usually a bit

193
00:23:12,480 --> 00:23:17,200
ahead so I'll ask one:
Q: What do you think? So who's in in the

194
00:23:17,200 --> 00:23:21,280
responsibility mainly to fix these
insecurities? Do you think this can be

195
00:23:21,280 --> 00:23:26,880
fixed by better default in these
infrastructures and configuration files?

196
00:23:26,880 --> 00:23:31,840
Is this to be fixed for better tutorials
and better education for the devop

197
00:23:31,840 --> 00:23:35,200
engineers? What was the main point of
responsibility?

198
00:23:35,200 --> 00:23:44,800
A: I definitely would prefer to have
secure default installations. But then you

199
00:23:44,800 --> 00:23:51,360
have this shared responsibility in the
contracts: From a certain point, you are

200
00:23:51,360 --> 00:23:57,440
responsible for the security of the
account, and we have seen this complexity

201
00:23:58,400 --> 00:24:08,732
because this might be 20 steps. Every step
is very simple and every step is looking

202
00:24:08,732 --> 00:24:15,999
very harmless, but all the steps together
might create a full exploit of a cloud. So

203
00:24:15,999 --> 00:24:23,979
this must be overseen, and it's very hard
for developers who are cloud native and

204
00:24:23,979 --> 00:24:30,417
are focusing on the application to have an
overview of the security. Developers now

205
00:24:30,417 --> 00:24:36,727
have 10 or 100 times more code on the hard
disk than ten years before. And this means

206
00:24:36,727 --> 00:24:43,360
developers are not able really to have a
full judgment about what is going on in

207
00:24:43,360 --> 00:24:49,319
terms of security. This is something
developers talk about security, either

208
00:24:49,319 --> 00:24:55,694
they are specialized on it or they have
not seen things like this. What I normally

209
00:24:55,694 --> 00:24:59,960
notice: The developers are not aware of
these problems.

210
00:24:59,960 --> 00:25:06,685
Q: OK. And what do you think, what can we
do about the complexity? So do you think

211
00:25:06,685 --> 00:25:10,374
we need better education for people to
actually understand the systems? Or is

212
00:25:10,374 --> 00:25:14,274
there a way in cloud infrastructures to
reduce the complexity?

213
00:25:14,274 --> 00:25:24,160
A: Better education? And do all the simple
fixes. These are five steps, and the fixes

214
00:25:24,160 --> 00:25:29,840
are also very simple. And you have to
check them and then you need a tool

215
00:25:29,840 --> 00:25:35,520
because you might have 20 clusters like
this. Every cluster has 20 applications,

216
00:25:35,520 --> 00:25:40,800
so this might be quite complicated. So you
need tools for an overview and in the

217
00:25:40,800 --> 00:25:46,720
trainings material, you see examples how
you can check your Kubernetes clusters for

218
00:25:46,720 --> 00:25:51,920
exploits like this.
Herald: OK, thank you very much. Thanks

219
00:25:51,920 --> 00:25:57,840
for being here. We will continue in about
half an hour with the next talk, then

220
00:25:57,840 --> 00:26:04,123
again in German. Thanks.
Thomas: Thank you very much. *Applaus*

221
00:26:04,123 --> 00:26:12,552
Outro: Everything is licensed under CC BY
4.0. And it is all for the community, to

222
00:26:12,552 --> 00:26:13,791
the unknown and for everyone.

223
00:26:13,791 --> 00:26:15,000
Subtitles created by c3subtitles.de
in the year 2022. Join, and help us!