Jenkins CI/CD (8/11): Workspace Cleanup, Timeouts, and Retries

← Previous Jenkins CI/CD (8/11) Next →

Summary: You harden the pipeline with a global timeout that kills hung builds, a cleanWs() step that wipes the workspace after every run, and targeted retry blocks that absorb transient Docker failures. These three additions turn a pipeline that works today into one that stays reliable over weeks and months of continuous use.

Example Values Used in This Tutorial

Key	Value
Timeout	15 minutes
Workspace cleanup	`cleanWs()`
Retry count	2 (integration stage)
Pipeline option	`timeout(time: 15, unit: 'MINUTES')`
Previous parts	Parts 1-7 completed

0. Prerequisites

A working Jenkins controller at http://localhost:8080 (Part 1).
The helloci Python package with unit tests, lint, integration tests, and artifact archiving in the pipeline (Parts 2-7).
The Workspace Cleanup plugin installed in Jenkins (ws-cleanup). Go to Manage Jenkins > Plugins > Available plugins and search for “Workspace Cleanup” if it is not already installed.
Familiarity with the current Jenkinsfile structure including post { always { } } blocks from Parts 4 and 7.

Note: The cleanWs() step requires the Workspace Cleanup plugin. Jenkins does not include it by default. If the step fails with “No such DSL method”, install the plugin and restart Jenkins.

1. Why Pipelines Rot

A pipeline that works on day one can break on day thirty without a single code change. This happens because CI agents are shared, mutable environments. Every build leaves traces behind, and those traces accumulate.

Here are the three most common ways a pipeline degrades over time:

State leaks between builds. A previous build leaves behind a .venv directory, a stale docker-compose project, or leftover test result files. The next build picks up that stale state and behaves differently than it would on a clean machine.
Hung builds waste resources. A Docker pull hangs on a slow registry. A pg_isready loop never exits. A test deadlocks. The build sits there consuming an executor slot until someone notices and kills it manually — sometimes hours later.
Transient failures cause false alarms. A Docker image pull times out. A container takes a few extra seconds to start. The build fails, the team investigates, and the answer is “just run it again.” After enough false alarms, the team starts ignoring real failures.

Each of these problems has a straightforward fix. You add a timeout to kill hung builds, workspace cleanup to prevent state leaks, and retry logic to absorb transient failures. Together, these three changes are the difference between a pipeline that needs babysitting and one that runs unattended.

2. Add a Global Timeout

A global timeout sets a hard limit on how long any single build can run. If the build exceeds the limit, Jenkins aborts it automatically.

Add the options block immediately after agent any in your Jenkinsfile:

pipeline {
    agent any
    options {
        timeout(time: 15, unit: 'MINUTES')
    }
    stages {
        ...
    }
}Code language: Groovy (groovy)

The options block applies to the entire pipeline. If the total build time exceeds 15 minutes — across all stages combined — Jenkins kills the build and marks it as ABORTED.

Why 15 minutes? The helloci pipeline creates a venv, installs dependencies, runs lint, runs unit tests, starts Docker Compose, and runs integration tests. On a reasonably fast machine, all of this finishes in under 5 minutes. Fifteen minutes gives plenty of headroom for slow networks or cold Docker caches, while still catching genuine hangs.

Tip: Set the timeout to roughly three times your typical build duration. Too tight and you get false aborts on slow days. Too loose and hung builds waste executor time.

3. Add Workspace Cleanup

The cleanWs() step deletes the entire workspace directory after the build finishes. Add it to the post { always { } } block:

post {
    always {
        sh 'docker compose logs > results/docker-logs.txt 2>&1 || true'
        sh 'docker compose down -v || true'
        junit 'results/*.xml'
        archiveArtifacts artifacts: 'results/**', allowEmptyArchive: true
        cleanWs()
    }
}Code language: Groovy (groovy)

The order matters. cleanWs() must be the last step in the post block. If you put it before junit or archiveArtifacts, those steps would find an empty workspace and fail.

After cleanWs() runs, the workspace directory is empty. The next build starts from a completely clean slate — no leftover .venv, no stale results/ directory, no orphaned Docker volumes.

4. The Cleanup Tradeoff

Cleaning the workspace after every build means the next build has to recreate everything from scratch. For this pipeline, that means:

python3 -m venv .venv creates a fresh virtual environment.
pip install -e ".[test]" downloads and installs all dependencies again.
Docker images that were previously cached may need to be pulled again.

On a fast network, this adds 30-60 seconds to the build. On a slow network, it can add several minutes.

Is the tradeoff worth it? For most teams, yes. A clean workspace guarantees that your build result reflects the current state of the code, not some combination of current code and leftover artifacts from a previous build. The certainty is worth the extra time.

Note: If build speed becomes a real problem, you can explore caching strategies — pip cache directories, Docker layer caching, or persistent venv directories outside the workspace. Those optimizations are valid but add complexity. Start with cleanWs() and optimize later only if the clean build time becomes a bottleneck.

5. Add Retry Logic for Flaky Steps

Some steps fail for reasons that have nothing to do with your code. Docker image pulls time out. Container startup races with readiness checks. Network blips cause transient errors. Retrying these steps is a legitimate hardening strategy.

Wrap the docker compose up command in a retry block inside the Integration Tests stage:

stage('Integration Tests') {
    steps {
        retry(2) {
            sh 'docker compose up -d'
        }
        sh 'docker compose exec -T postgres pg_isready -U testuser -d testdb --timeout=30'
        sh '.venv/bin/pytest tests/test_integration.py --junitxml=results/junit-integration.xml -v'
    }
}Code language: Groovy (groovy)

The retry(2) block runs the enclosed steps up to two times. If docker compose up -d fails on the first attempt, Jenkins retries it once. If the second attempt also fails, the stage fails normally.

Notice that the retry wraps only the docker compose up command — not the entire stage. This is intentional. If the pytest command fails, that is a real test failure, not a transient infrastructure problem. Retrying a genuine test failure would mask bugs.

Warning: retry is not a substitute for fixing flaky infrastructure. If Docker Compose fails consistently, the fix is to debug the compose file or the Docker daemon — not to add more retries. Use retry to absorb rare, transient failures that you cannot prevent.

6. The Full Hardened Jenkinsfile

Here is the complete Jenkinsfile with all three hardening additions — timeout, retry, and workspace cleanup. Replace your existing Jenkinsfile with this version:

pipeline {
    agent any
    options {
        timeout(time: 15, unit: 'MINUTES')
    }
    stages {
        stage('Setup Python') {
            steps {
                sh 'python3 -m venv .venv'
                sh '.venv/bin/pip install --upgrade pip'
            }
        }
        stage('Install Dependencies') {
            steps {
                sh '.venv/bin/pip install -e ".[test]"'
            }
        }
        stage('Lint') {
            steps {
                sh '.venv/bin/ruff check src/ tests/'
            }
        }
        stage('Unit Tests') {
            steps {
                sh 'mkdir -p results'
                sh '.venv/bin/pytest tests/test_greet.py --junitxml=results/junit.xml'
            }
        }
        stage('Integration Tests') {
            steps {
                retry(2) {
                    sh 'docker compose up -d'
                }
                sh 'docker compose exec -T postgres pg_isready -U testuser -d testdb --timeout=30'
                sh '.venv/bin/pytest tests/test_integration.py --junitxml=results/junit-integration.xml -v'
            }
        }
    }
    post {
        always {
            sh 'docker compose logs > results/docker-logs.txt 2>&1 || true'
            sh 'docker compose down -v || true'
            junit 'results/*.xml'
            archiveArtifacts artifacts: 'results/**', allowEmptyArchive: true
            cleanWs()
        }
    }
}Code language: Groovy (groovy)

Compare this to the Jenkinsfile from Part 7. Three things changed:

The options block adds a 15-minute global timeout.
The retry(2) block wraps docker compose up -d in the Integration Tests stage.
The cleanWs() step at the end of the post block wipes the workspace after every build.

Commit and push the updated file:

cd ~/projects/helloci
git add Jenkinsfile
git commit -m "Add timeout, retry, and workspace cleanup to pipeline"
git push origin mainCode language: Shell Session (shell)

Trigger a build in Jenkins and confirm it completes successfully.

7. Test the Timeout

You do not need to wait 15 minutes to understand how the timeout works. Here is what happens when a build exceeds the limit:

Jenkins monitors the total elapsed time from the moment the build starts.
When the clock hits 15 minutes, Jenkins sends an interrupt to the running step.
The build is marked as ABORTED (not FAILURE).
The post { always { } } block still runs, so cleanWs() and artifact archiving still happen.

The console output for a timed-out build looks like this:

Cancelling nested steps due to timeout
...
Finished: ABORTEDCode language: Shell Session (shell)

You can test this behavior by temporarily lowering the timeout to a value you know the build will exceed:

options {
    timeout(time: 1, unit: 'SECONDS')
}Code language: Groovy (groovy)

Run the build, confirm it aborts, then change the timeout back to 15 minutes. This is a safe experiment — the post block still runs and cleans up.

Tip: You can also set per-stage timeouts by placing timeout inside a steps block. The global pipeline timeout is a safety net; per-stage timeouts give finer-grained control.

8. Test Workspace Cleanup

After a build completes with cleanWs() in the post block, verify that the workspace is actually empty.

Find the workspace directory on the Jenkins agent. For a job named helloci, the default path is:

ls -la /var/lib/jenkins/workspace/helloci/Code language: Shell Session (shell)

total 0Code language: Shell Session (shell)

The directory exists but is empty. No .venv, no results/, no Jenkinsfile — everything was wiped by cleanWs().

Now trigger another build and watch the console output. You will see Jenkins clone the repository fresh, create a new .venv, and install all dependencies from scratch. This confirms that every build starts from a clean state.

If you do not see an empty workspace after the build, check two things:

Verify cleanWs() appears in the console output. Search for “Deleting workspace” in the build log.
Verify the Workspace Cleanup plugin is installed. Go to Manage Jenkins > Plugins > Installed plugins and search for ws-cleanup.

9. When NOT to Use Retry

The retry directive is powerful, but it is also dangerous if misused. Here are the rules:

Use retry for infrastructure flakiness:

Docker image pulls that occasionally time out.
Container startup that sometimes loses a race condition.
Network calls to external services that have transient failures.

Do NOT use retry for test failures:

If pytest fails, the test found a bug. Retrying the test hides the bug.
If a linter reports violations, retrying will not fix the code.
If pip install fails because of a missing dependency, retrying will not make the dependency appear.

The distinction is simple: retry when the same command would succeed if you ran it again immediately. Do not retry when the failure is deterministic — those need a code fix, not a re-run.

Warning: A pipeline that retries too aggressively trains the team to distrust failures. “Oh, it probably just needs another run” becomes the default response, and real bugs slip through. Keep retries targeted and minimal.

10. Hardening Reference

Here is a summary of all the hardening options covered in this tutorial, plus a few additional options you may want to explore later.

Option	Scope	Syntax	Purpose
`timeout`	Pipeline	`options { timeout(time: 15, unit: 'MINUTES') }`	Kill builds that exceed the time limit
`timeout`	Step	`timeout(time: 5, unit: 'MINUTES') { sh '...' }`	Kill a single step that takes too long
`retry`	Step	`retry(2) { sh '...' }`	Re-run a step on transient failure
`cleanWs`	Post	`post { always { cleanWs() } }`	Delete workspace after every build
`skipDefaultCheckout`	Pipeline	`options { skipDefaultCheckout() }`	Disable automatic Git checkout (use `checkout scm` manually)
`disableConcurrentBuilds`	Pipeline	`options { disableConcurrentBuilds() }`	Prevent parallel builds of the same job

The first four are implemented in this tutorial. The last two are covered in Part 9 (concurrency and port conflicts).

Summary

You added three hardening measures to the helloci pipeline, turning it from a pipeline that works into one that stays working.

A global timeout(time: 15, unit: 'MINUTES') in the options block kills builds that hang, freeing up executor slots automatically.
A cleanWs() step at the end of the post { always { } } block wipes the workspace after every build, preventing stale state from leaking between runs.
A retry(2) block around docker compose up -d absorbs transient Docker failures without masking real bugs.
The cleanup tradeoff (re-creating the venv every build) is worth the certainty of a clean slate. Optimize with caching later if needed.
Retry should only wrap infrastructure steps that can fail transiently — never wrap test execution or linting.

These changes require no new tools, no external services, and no changes to the application code. They are pure pipeline hygiene — small additions that prevent the slow decay of a CI system over time.

Next up in Part 9: you tackle concurrency and port conflicts with disableConcurrentBuilds and unique Docker Compose project names, so two builds of the same job never collide.

Jenkins CI/CD (8/11): Workspace Cleanup, Timeouts, and Retries

0. Prerequisites

1. Why Pipelines Rot

2. Add a Global Timeout

3. Add Workspace Cleanup

4. The Cleanup Tradeoff

5. Add Retry Logic for Flaky Steps

6. The Full Hardened Jenkinsfile

7. Test the Timeout

8. Test Workspace Cleanup

9. When NOT to Use Retry

10. Hardening Reference

Summary

Jenkins CI/CD — All Parts

Like this:

The State Of Python ‘setuptools’ ‘pyproject.toml’ Packaging In 2024

Like this:

Adding Materials & Texture To Blender Objects: Part 1

Like this:

Python Parsing Command Line Arguments With argparse

Like this:

Adding Materials & Texture To Blender Objects: Part 2

Like this:

pgmonkey (7/8) — CSV Import and Export

Like this:

Jenkins CI/CD (11/11): Publish to PyPI Securely

Like this:

Leave a Reply Cancel reply

0. Prerequisites

1. Why Pipelines Rot

2. Add a Global Timeout

3. Add Workspace Cleanup

4. The Cleanup Tradeoff

5. Add Retry Logic for Flaky Steps

6. The Full Hardened Jenkinsfile

7. Test the Timeout

8. Test Workspace Cleanup

9. When NOT to Use Retry

10. Hardening Reference

Summary

Jenkins CI/CD — All Parts

Share this:

Like this:

Similar Posts

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Leave a Reply Cancel reply