kubernetes-plugin could use Jobs instead of bare Pods

Discussion:

d***@gebit.de

2018-11-22 15:03:22 UTC

Hi all,

I am working on a kubernetes cluster for Jenkins. I quickly found Carlos
Sanchez' kubernetes-plugin and started using and adapting it.
The plugin creates "bare" Pods directly in kubernetes, watches and in the
end deletes them after the agent process inside the Pod terminated.
I think there are several issues with this:
a) bare Pods do not get rescheduled in failover cases, for example node
crash
b) kubernetes-plugin is responsible for deleting such a bare Pod (because
Pods are designed to never terminate). If Jenkins (or the node its running
on)
crashes while terminating the slave, it might never actually delete the Pod.
c) kubernetes-plugin immediately deletes the Pod after the build inside the
container is finished. If something else is happening inside the Pod after
the build (e. g. cleanup or syncing), it has no chance of completing.

While researching, I found out that kubernetes provides a "Job" object that
runs a certain amount of containers until successful completion.
Using a kubernetes Job to schedule builds onto the cluster seems to solve
all my aforementioned issues.
a) a Job ensures that the specified number of completed runs are achieved,
even in case of failover
b) and c) a Job automatically deletes all associated Pods when it gets
deleted (and finished Jobs themselves can get auto-deleted with
the new alpha feature TTLAfterFinished in k8s v1.12). This frees the
kubernetes-plugin from caring for the Pods after build completion.

Has anyone experience with kubernetes Jobs? Am I missing something obvious?

@Carlos Sanchez: If not, I would try to implement this in your
kubernetes-plugin. Would you be interested to get this back into your
mainline? Then I would fork the github repo and start directly off master.

Regards,
Daniel

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-dev+***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/1fe306b4-f8f6-48b1-a98a-087b119f880f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Carlos Sanchez

2018-11-22 15:41:45 UTC

Permalink

It's not using jobs because they did not exist when the plugin was created,
and yes, it would be a good addition to the plugin.

Note that I don't think it would fix a) as Jenkins expects the agent to
keep state. ie. agent clones git repo then crashes later on, Jenkins
expects the agent to already have the gi repo cloned after agent restart

Post by d***@gebit.de
Hi all,
I am working on a kubernetes cluster for Jenkins. I quickly found Carlos
Sanchez' kubernetes-plugin and started using and adapting it.
The plugin creates "bare" Pods directly in kubernetes, watches and in the
end deletes them after the agent process inside the Pod terminated.
a) bare Pods do not get rescheduled in failover cases, for example node
crash
b) kubernetes-plugin is responsible for deleting such a bare Pod (because
Pods are designed to never terminate). If Jenkins (or the node its running
on)
crashes while terminating the slave, it might never actually delete the Pod.
c) kubernetes-plugin immediately deletes the Pod after the build inside
the container is finished. If something else is happening inside the Pod
after
the build (e. g. cleanup or syncing), it has no chance of completing.
While researching, I found out that kubernetes provides a "Job" object
that runs a certain amount of containers until successful completion.
Using a kubernetes Job to schedule builds onto the cluster seems to solve
all my aforementioned issues.
a) a Job ensures that the specified number of completed runs are achieved,
even in case of failover
b) and c) a Job automatically deletes all associated Pods when it gets
deleted (and finished Jobs themselves can get auto-deleted with
the new alpha feature TTLAfterFinished in k8s v1.12). This frees the
kubernetes-plugin from caring for the Pods after build completion.
Has anyone experience with kubernetes Jobs? Am I missing something obvious?
@Carlos Sanchez: If not, I would try to implement this in your
kubernetes-plugin. Would you be interested to get this back into your
mainline? Then I would fork the github repo and start directly off master.
Regards,
Daniel
--
You received this message because you are subscribed to the Google Groups
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
To view this discussion on the web visit
https://groups.google.com/d/msgid/jenkinsci-dev/1fe306b4-f8f6-48b1-a98a-087b119f880f%40googlegroups.com
<https://groups.google.com/d/msgid/jenkinsci-dev/1fe306b4-f8f6-48b1-a98a-087b119f880f%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-dev+***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CALHFn6N4Z4fwADnCL7khj3Ey%3D9dg1%2BA--VN85FNi9bBKmiRHiw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

d***@gebit.de

2018-11-22 16:49:53 UTC

Permalink

Nice, I will start off your master then.

And yes, a) might actually not be fixed by using Jobs. I will continue
researching and report back.

Regards,
Daniel

Post by Carlos Sanchez
It's not using jobs because they did not exist when the plugin was
created, and yes, it would be a good addition to the plugin.
Note that I don't think it would fix a) as Jenkins expects the agent to
keep state. ie. agent clones git repo then crashes later on, Jenkins
expects the agent to already have the gi repo cloned after agent restart

Post by d***@gebit.de
Hi all,
I am working on a kubernetes cluster for Jenkins. I quickly found Carlos
Sanchez' kubernetes-plugin and started using and adapting it.
The plugin creates "bare" Pods directly in kubernetes, watches and in the
end deletes them after the agent process inside the Pod terminated.
a) bare Pods do not get rescheduled in failover cases, for example node
crash
b) kubernetes-plugin is responsible for deleting such a bare Pod (because
Pods are designed to never terminate). If Jenkins (or the node its running
on)
crashes while terminating the slave, it might never actually delete the Pod.
c) kubernetes-plugin immediately deletes the Pod after the build inside
the container is finished. If something else is happening inside the Pod
after
the build (e. g. cleanup or syncing), it has no chance of completing.
While researching, I found out that kubernetes provides a "Job" object
that runs a certain amount of containers until successful completion.
Using a kubernetes Job to schedule builds onto the cluster seems to solve
all my aforementioned issues.
a) a Job ensures that the specified number of completed runs are
achieved, even in case of failover
b) and c) a Job automatically deletes all associated Pods when it gets
deleted (and finished Jobs themselves can get auto-deleted with
the new alpha feature TTLAfterFinished in k8s v1.12). This frees the
kubernetes-plugin from caring for the Pods after build completion.
Has anyone experience with kubernetes Jobs? Am I missing something obvious?
@Carlos Sanchez: If not, I would try to implement this in your
kubernetes-plugin. Would you be interested to get this back into your
mainline? Then I would fork the github repo and start directly off master.
Regards,
Daniel
--
You received this message because you are subscribed to the Google Groups
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
To view this discussion on the web visit
https://groups.google.com/d/msgid/jenkinsci-dev/1fe306b4-f8f6-48b1-a98a-087b119f880f%40googlegroups.com
<https://groups.google.com/d/msgid/jenkinsci-dev/1fe306b4-f8f6-48b1-a98a-087b119f880f%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-dev+***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/25d96a14-4d8d-407d-aa21-ead2188b08af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Olblak

2018-11-26 08:58:05 UTC

Permalink

Post by Carlos Sanchez
Note that I don't think it would fix a) as Jenkins expects the agent
to keep state. ie. agent clones git repo then crashes later on,
Jenkins expects the agent to already have the gi repo cloned after
agent restart

Why not using persistent volume to keep the state? Like an emptyDir?

---
-> gpg --keyserver keys.gnupg.net --recv-key 52210D3D
---

Post by Carlos Sanchez
Nice, I will start off your master then.
And yes, a) might actually not be fixed by using Jobs. I will continue
researching and report back.>
Regards,
Daniel
Am Donnerstag, 22. November 2018 16:42:04 UTC+1 schrieb Carlos
Sanchez:>> It's not using jobs because they did not exist when the plugin was

created, and yes, it would be a good addition to the plugin.>>
Note that I don't think it would fix a) as Jenkins expects the agent
to keep state. ie. agent clones git repo then crashes later on,
Jenkins expects the agent to already have the gi repo cloned after
agent restart>>

Post by d***@gebit.de
Hi all,
I am working on a kubernetes cluster for Jenkins. I quickly found
Carlos Sanchez' kubernetes-plugin and started using and adapting it.>>> The plugin creates "bare" Pods directly in kubernetes, watches and
in the end deletes them after the agent process inside the Pod
a) bare Pods do not get rescheduled in failover cases, for example
node crash>>> b) kubernetes-plugin is responsible for deleting such a bare Pod
(because Pods are designed to never terminate). If Jenkins (or
the node its running on)>>> crashes while terminating the slave, it might never actually delete
the Pod.>>> c) kubernetes-plugin immediately deletes the Pod after the build
inside the container is finished. If something else is happening
inside the Pod after>>> the build (e. g. cleanup or syncing), it has no chance of
completing.>>>
While researching, I found out that kubernetes provides a "Job"
object that runs a certain amount of containers until successful
completion.>>> Using a kubernetes Job to schedule builds onto the cluster seems to
solve all my aforementioned issues.>>> a) a Job ensures that the specified number of completed runs are
achieved, even in case of failover>>> b) and c) a Job automatically deletes all associated Pods when it
gets deleted (and finished Jobs themselves can get auto-deleted
with>>> the new alpha feature TTLAfterFinished in k8s v1.12). This frees the
kubernetes-plugin from caring for the Pods after build completion.>>>
Has anyone experience with kubernetes Jobs? Am I missing something
obvious?>>>
@Carlos Sanchez: If not, I would try to implement this in your kubernetes-
plugin. Would you be interested to get this back into your>>> mainline? Then I would fork the github repo and start directly off
master.>>>
Regards,
Daniel
--
You received this message because you are subscribed to the Google
Groups "Jenkins Developers" group.>>> To unsubscribe from this group and stop receiving emails from it,
https://groups.google.com/d/msgid/jenkinsci-dev/1fe306b4-f8f6-48b1-a98a-087b119f880f%40googlegroups.com[1].>>> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "Jenkins Developers" group.> To unsubscribe from this group and stop receiving emails from it,
https://groups.google.com/d/msgid/jenkinsci-dev/25d96a14-4d8d-407d-aa21-ead2188b08af%40googlegroups.com[2].> For more options, visit https://groups.google.com/d/optout.

Links:

1. https://groups.google.com/d/msgid/jenkinsci-dev/1fe306b4-f8f6-48b1-a98a-087b119f880f%40googlegroups.com?utm_medium=email&utm_source=footer
2. https://groups.google.com/d/msgid/jenkinsci-dev/25d96a14-4d8d-407d-aa21-ead2188b08af%40googlegroups.com?utm_medium=email&utm_source=footer

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-dev+***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/1543222685.1069733.1588779456.73984107%40webmail.messagingengine.com.
For more options, visit https://groups.google.com/d/optout.

Carlos Sanchez

2018-11-26 09:14:46 UTC

Permalink

Post by Carlos Sanchez
Note that I don't think it would fix a) as Jenkins expects the agent to
keep state. ie. agent clones git repo then crashes later on, Jenkins
expects the agent to already have the gi repo cloned after agent restart
Why not using persistent volume to keep the state? Like an emptyDir?

if pod terminates emptyDir is no longer available. And using a persistent
volume like block storage brings other problems like attachments,
rescheduling in different AZs,...

Post by Carlos Sanchez
---
-> gpg --keyserver keys.gnupg.net --recv-key 52210D3D
---
Nice, I will start off your master then.
And yes, a) might actually not be fixed by using Jobs. I will continue
researching and report back.
Regards,
Daniel
It's not using jobs because they did not exist when the plugin was
created, and yes, it would be a good addition to the plugin.
Note that I don't think it would fix a) as Jenkins expects the agent to
keep state. ie. agent clones git repo then crashes later on, Jenkins
expects the agent to already have the gi repo cloned after agent restart
Hi all,
I am working on a kubernetes cluster for Jenkins. I quickly found Carlos
Sanchez' kubernetes-plugin and started using and adapting it.
The plugin creates "bare" Pods directly in kubernetes, watches and in the
end deletes them after the agent process inside the Pod terminated.
a) bare Pods do not get rescheduled in failover cases, for example node
crash
b) kubernetes-plugin is responsible for deleting such a bare Pod (because
Pods are designed to never terminate). If Jenkins (or the node its running
on)
crashes while terminating the slave, it might never actually delete the Pod.
c) kubernetes-plugin immediately deletes the Pod after the build inside
the container is finished. If something else is happening inside the Pod
after
the build (e. g. cleanup or syncing), it has no chance of completing.
While researching, I found out that kubernetes provides a "Job" object
that runs a certain amount of containers until successful completion.
Using a kubernetes Job to schedule builds onto the cluster seems to solve
all my aforementioned issues.
a) a Job ensures that the specified number of completed runs are achieved,
even in case of failover
b) and c) a Job automatically deletes all associated Pods when it gets
deleted (and finished Jobs themselves can get auto-deleted with
the new alpha feature TTLAfterFinished in k8s v1.12). This frees the
kubernetes-plugin from caring for the Pods after build completion.
Has anyone experience with kubernetes Jobs? Am I missing something obvious?
@Carlos Sanchez: If not, I would try to implement this in your
kubernetes-plugin. Would you be interested to get this back into your
mainline? Then I would fork the github repo and start directly off master.
Regards,
Daniel
--
You received this message because you are subscribed to the Google Groups
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
To view this discussion on the web visit
https://groups.google.com/d/msgid/jenkinsci-dev/1fe306b4-f8f6-48b1-a98a-087b119f880f%40googlegroups.com
<https://groups.google.com/d/msgid/jenkinsci-dev/1fe306b4-f8f6-48b1-a98a-087b119f880f%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
To view this discussion on the web visit
https://groups.google.com/d/msgid/jenkinsci-dev/25d96a14-4d8d-407d-aa21-ead2188b08af%40googlegroups.com
<https://groups.google.com/d/msgid/jenkinsci-dev/25d96a14-4d8d-407d-aa21-ead2188b08af%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
To view this discussion on the web visit
https://groups.google.com/d/msgid/jenkinsci-dev/1543222685.1069733.1588779456.73984107%40webmail.messagingengine.com
<https://groups.google.com/d/msgid/jenkinsci-dev/1543222685.1069733.1588779456.73984107%40webmail.messagingengine.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-dev+***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CALHFn6Njjh4k%2BXDvCET91tcHA3E02Q%3D4xU8_-OSN_-xaB%3DmsNw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

d***@gebit.de

2018-11-27 08:48:45 UTC

Permalink

Actually, I am using a persistent volume to keep state of the working
directory of each agent (and rsync it before and after each build). But if
kubernetes would automatically reschedule an agent Pod (in case of a
crashed node, for example),
then runtime state information like tokens and identity would be lacking.
The Jenkins master wouldn't know the agent trying to connect. So using a
Job would actually complicate things in this case.
And I also found an easier way to achieve my points b) and c) from the
original post, without rewriting half the plugin for Jobs. If I configure
the kubernetes-plugin's PodRetention to "Always", the kubernetes-plugin
will no longer bother itself with the Pods after starting them. Then I only
need an automatic way of kubernetes to clean completed Pods (not Jobs, as
suggested before). There is a setting in the controller-manager flags
called "terminated-pod-gc-threshold" (and its not alpha, like the one
mentioned before for Jobs). Setting this flag to a reasonably small number
gives me exactly the behavior I want.

Post by Carlos Sanchez

Post by Carlos Sanchez
Note that I don't think it would fix a) as Jenkins expects the agent to
keep state. ie. agent clones git repo then crashes later on, Jenkins
expects the agent to already have the gi repo cloned after agent restart
Why not using persistent volume to keep the state? Like an emptyDir?

if pod terminates emptyDir is no longer available. And using a persistent
volume like block storage brings other problems like attachments,
rescheduling in different AZs,...

Post by Carlos Sanchez
---
-> gpg --keyserver keys.gnupg.net --recv-key 52210D3D
---
Nice, I will start off your master then.
And yes, a) might actually not be fixed by using Jobs. I will continue
researching and report back.
Regards,
Daniel
It's not using jobs because they did not exist when the plugin was
created, and yes, it would be a good addition to the plugin.
Note that I don't think it would fix a) as Jenkins expects the agent to
keep state. ie. agent clones git repo then crashes later on, Jenkins
expects the agent to already have the gi repo cloned after agent restart
Hi all,
I am working on a kubernetes cluster for Jenkins. I quickly found Carlos
Sanchez' kubernetes-plugin and started using and adapting it.
The plugin creates "bare" Pods directly in kubernetes, watches and in the
end deletes them after the agent process inside the Pod terminated.
a) bare Pods do not get rescheduled in failover cases, for example node
crash
b) kubernetes-plugin is responsible for deleting such a bare Pod (because
Pods are designed to never terminate). If Jenkins (or the node its running
on)
crashes while terminating the slave, it might never actually delete the Pod.
c) kubernetes-plugin immediately deletes the Pod after the build inside
the container is finished. If something else is happening inside the Pod
after
the build (e. g. cleanup or syncing), it has no chance of completing.
While researching, I found out that kubernetes provides a "Job" object
that runs a certain amount of containers until successful completion.
Using a kubernetes Job to schedule builds onto the cluster seems to solve
all my aforementioned issues.
a) a Job ensures that the specified number of completed runs are
achieved, even in case of failover
b) and c) a Job automatically deletes all associated Pods when it gets
deleted (and finished Jobs themselves can get auto-deleted with
the new alpha feature TTLAfterFinished in k8s v1.12). This frees the
kubernetes-plugin from caring for the Pods after build completion.
Has anyone experience with kubernetes Jobs? Am I missing something obvious?
@Carlos Sanchez: If not, I would try to implement this in your
kubernetes-plugin. Would you be interested to get this back into your
mainline? Then I would fork the github repo and start directly off master.
Regards,
Daniel
--
You received this message because you are subscribed to the Google Groups
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
To view this discussion on the web visit
https://groups.google.com/d/msgid/jenkinsci-dev/1fe306b4-f8f6-48b1-a98a-087b119f880f%40googlegroups.com
<https://groups.google.com/d/msgid/jenkinsci-dev/1fe306b4-f8f6-48b1-a98a-087b119f880f%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
To view this discussion on the web visit
https://groups.google.com/d/msgid/jenkinsci-dev/25d96a14-4d8d-407d-aa21-ead2188b08af%40googlegroups.com
<https://groups.google.com/d/msgid/jenkinsci-dev/25d96a14-4d8d-407d-aa21-ead2188b08af%40googlegroups.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an
To view this discussion on the web visit
https://groups.google.com/d/msgid/jenkinsci-dev/1543222685.1069733.1588779456.73984107%40webmail.messagingengine.com
<https://groups.google.com/d/msgid/jenkinsci-dev/1543222685.1069733.1588779456.73984107%40webmail.messagingengine.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-dev+***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/49beba89-cb7f-4c2e-8256-4c14adab4b87%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.