๐ŸชตCatching Python subprocess logs

Rafael
Rafael
๐ŸชตCatching Python subprocess logs
Table of Contents
Table of Contents

๐Ÿ“– Setting up subprocess.Popen for streaming child process messages to stdout ๐Ÿง‘โ€๐Ÿ’ป and a to a log file ๐Ÿ“„

One day we needed to run a Python CLI application that loaded test data to a database. We wanted to run this script as part of the unittests set up that we were building for automating a section of our application end to end tests. This CLI program had the following characteristics:

  1. Printed logs to stdout and stderr.
  2. Lots of logs were printed.
  3. It took about 5 minutes to complete on a standard developer laptop (2022).

Here we have a bare-bones version of the CLI application (child_process.py):

# child_process.py

import contextlib
import logging
import sys
from time import sleep
from typing import Generator, Optional, TextIO


logger = logging.getLogger(__name__)


LOG_FORMAT = "%(asctime)s %(levelname)s:%(name)s:%(lineno)d:%(message)s"


@contextlib.contextmanager
def set_logging(
    console_stream: Optional[TextIO] = None,
) -> Generator[None, None, None]:
    if console_stream is None:
        console_stream = sys.stderr

    console_handler = logging.StreamHandler(console_stream)
    console_handler.setLevel(logging.DEBUG)
    console_handler.setFormatter(logging.Formatter(LOG_FORMAT))

    root = logging.getLogger()
    root.addHandler(console_handler)
    old_level = root.level
    root.setLevel(logging.DEBUG)

    try:
        yield
    finally:
        root.setLevel(old_level)
        root.removeHandler(console_handler)
        console_handler.flush()
        console_handler.close()

if __name__ == "__main__":
    with set_logging():
        for i in range(5):
            logger.debug(f"Stderr: Spam! {i}")
            # Sleep to simulate some other work
            sleep(1)
            if i % 2:
                print(f"Stdout: Ham! {i}")

Note that the log set up sends the logs to stderr and that there are some messages going to stdout via the print function.

Since it took a while for the process to complete, we wanted to see the application logs while the process was executing in case runtime problems occurred at this set up step, for example, misuse of the CLI application parameters, or missing environment variables.

Next, we have a version of test code that we were building to automate parts of our end to end tests (tests/test_child_process.py):

# tests/test_child_process.py

import subprocess
import sys
from typing import List
import unittest


def run_process(args: List[str]) -> subprocess.CompletedProcess:
    return subprocess.run(
        args,
        stdout=sys.stdout,
        stderr=sys.stdout,
        encoding="utf-8",
        check=True,
    )


class TestLogCapture(unittest.TestCase):
    @classmethod
    def setUpClass(cls) -> None:
        cmd = ["python", "child_process.py"]
        completed_proc = run_process(cmd)

        print("Stream done")
        print(completed_proc)

    def test_smoke(self):
        print("Test 1")

Note that we have asked our helper run_process function to route the child_process.py stdout and stderr to the system's stdout. In this way, we will be able to see the logs as they occur, without having to wait for the program to complete (or fail). Running this test we get:

$ python -m unittest -v tests/test_child_process.py 
2024-04-29 18:24:45,710 DEBUG:__main__:44:Stderr: Spam! 0
2024-04-29 18:24:46,711 DEBUG:__main__:44:Stderr: Spam! 1
Stdout: Ham! 1
2024-04-29 18:24:47,713 DEBUG:__main__:44:Stderr: Spam! 2
2024-04-29 18:24:48,714 DEBUG:__main__:44:Stderr: Spam! 3
Stdout: Ham! 3
2024-04-29 18:24:49,715 DEBUG:__main__:44:Stderr: Spam! 4
Stream done
CompletedProcess(args=['python', 'child_process.py'], returncode=0)
test_smoke (tests.test_child_process.TestLogCapture) ... Test 1
ok

----------------------------------------------------------------------
Ran 1 test in 5.044s

OK

Great! With this subprocess.run configuration we can see the logs as they occur. Note that the lines with "Stderr: Spam! 1" and "Stdout: Ham! 1" print one after the other, meaning that stdout and stderr messages from the child process are sent to the main process stdout as soon as they are received (line buffered).

๐Ÿ’พ Persisting the logs

Monitoring the process logs as they occur is nice, but what if we want to route those same logs to a file and to stdout at the same time?

This turned out to be a bit more tricky than I anticipated, and required us to reach for subprocess.Popen ๐Ÿซฃ

However, I'll also list here the alternative ideas I had which didn't work, in case I run into a similar problem in the future hoping to discard them more quickly or improve them if I've had acquired the missing knowledge:

Passing a file handle created with open

๐Ÿ’กIdea: subprocess.run stdout argument accepts a file handle. When the text mode parameters are passed in, the open function returns a TextIO file handle. Because we also want to send the message to stdout we could monkey patch TextIO.write method to write a message to a file and to stdout.

๐Ÿ“Sticking point: Looking at the underlying write method code, it seems to be written in C ๐Ÿ’ชwith a lot of low level dependencies relying on it. Where do I even begin? Where is the right place to do this monkey patching? I can't patch a monkey if I don't even know where it is ๐Ÿ™ˆ.

Passing the sys.stdout stream

๐Ÿ’กIdea: subprocess.run stdout argument accepts a stream. This is what we are using in the bare-bones test module. I learned from the wikiversity that sys.stdout is backed by an actual file (os.ttyname(0)). Could we copy the content of this file into our target test file after the process runs?

๐Ÿ“Sticking point: I tried copying the file after the child process executed in the set up method, but the copying step hangs ๐Ÿ˜‘ It might have to do with Python's buffering, or that the source file is locked, I honestly don't know, so we will have to try something else.

Using subprocess.Popen ๐Ÿซฃ

Well, with great power comes great responsibility ๐Ÿ•ธ๏ธ Initially, I mostly tip toed my way around this option owing to my previous experience with it mainly hanging while waiting for Popen.communicate to do its work, which is not well suited for our purpose here because we want our solution to work with a lot of log messages. However, since that past experience I came across Luca's blog post that gave me a clear sight to a potential solution.

๐ŸŽฏ
Luca's blog post goes deep into more fancy popen stuff, I suggest you read it too. If you are facing these kinds of problems, it is sure to inspire you and perhaps lead you to a solution.

Where were we?... Oh yes, the code. Back to our testing module, I made the following changes

# tests/test_child_process.py

import os
from pathlib import Path
import subprocess
import sys
import tempfile
from typing import List
import unittest


def run_process(args: List[str], log_file_path) -> subprocess.CompletedProcess:
    env = os.environ
    env["PYTHONUNBUFFERED"] = "True"
    with open(log_file_path, "at", encoding="utf-8") as fh:
        with subprocess.Popen(
            args,
            bufsize=1,
            text=True,
            encoding="utf-8",
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            env=env,
        ) as proc:
            while (ret_code:=proc.poll()) is None:
                line = proc.stdout.readline()
                sys.stdout.write(line)
                fh.write(line)

    return ret_code


class TestLogCapture(unittest.TestCase):
    @classmethod
    def setUpClass(cls) -> None:
        cls.tmpdir = tempfile.TemporaryDirectory()
        cls.addClassCleanup(cls.tmpdir.cleanup)
        cls.test_child_log_path = Path(cls.tmpdir.name) / "test_child-process.log"

        cmd = ["python", "child_process.py"]

        completed_proc = run_process(cmd, cls.test_child_log_path)

        print("Stream done")
        print(f"Process return code: {completed_proc}")

    def test_smoke(self):
        print("Test 1")

    def test_child_process_written_to_file(self):
        print("\n")
        print("Printing from test_child_process_written_to_file")
        with open(self.test_child_log_path, "r", encoding="utf-8") as fh:
            for line in fh:
                print(line.rstrip())

Note that our helper run_process function now accepts a file handle and uses subproces.Popen ๐Ÿซฃ, as promised. The keys to get the desired behavior are:

  1. Setting bufsize to 1, so the stdout and stderr pipe file objects are opened in line buffered mode (passed behind the scenes to our friend the open function).
  2. Setting text to True, so the stdout and stderr pipe file objects are text streams and not byte streams.
  3. Setting stderr to subprocess.STDOUT, so the child process stderr is captured on the same file as its stdout.
  4. Setting the PYTHONUNBUFFERED environment variable so the print statements of the child process are written (flushed) as they occur in the code. This Real Python blog post was the source of inspiration here.

In addition to these changes, I also altered the test class to include a temporary directory and log file in the setUpClass, and added a second "test" that prints the content of the log file.

Running this test module we get:

$ python -m unittest tests.test_child_process -v
2024-05-10 17:00:14,293 DEBUG:__main__:44:Stderr: Spam! 0
2024-05-10 17:00:15,294 DEBUG:__main__:44:Stderr: Spam! 1
Stdout: Ham! 1
2024-05-10 17:00:16,295 DEBUG:__main__:44:Stderr: Spam! 2
2024-05-10 17:00:17,296 DEBUG:__main__:44:Stderr: Spam! 3
Stdout: Ham! 3
2024-05-10 17:00:18,298 DEBUG:__main__:44:Stderr: Spam! 4
Stream done
Process return code: 0
test_child_process_written_to_file (tests.test_child_process.TestLogCapture) ... 

Printing from test_child_process_written_to_file
2024-05-10 17:00:14,293 DEBUG:__main__:44:Stderr: Spam! 0
2024-05-10 17:00:15,294 DEBUG:__main__:44:Stderr: Spam! 1
Stdout: Ham! 1
2024-05-10 17:00:16,295 DEBUG:__main__:44:Stderr: Spam! 2
2024-05-10 17:00:17,296 DEBUG:__main__:44:Stderr: Spam! 3
Stdout: Ham! 3
2024-05-10 17:00:18,298 DEBUG:__main__:44:Stderr: Spam! 4
ok
test_smoke (tests.test_child_process.TestLogCapture) ... Test 1
ok

----------------------------------------------------------------------
Ran 2 tests in 5.048s

OK

Great success! Note that the stdout and stderr are written in the correct order (intertwined) while the child process runs, and that the log file captured the messages in the same manner ๐ŸŽ‰

๐Ÿ“ฃ
Try the code on your own. Set the range on the child process to a large number. It should still work.
GitHub - RafaelPinto/subprocess-stream: Setting up subprocess.Popen for streaming child process messages to stdout and a log file.
Setting up subprocess.Popen for streaming child process messages to stdout and a log file. - RafaelPinto/subprocess-stream

The GitHub repository with the sample code.


๐Ÿ“š Resources


๐Ÿ“ Summary

In this post we learned:

  1. How to configure Python's subprocess.run to stream the child process stdout and stderr to the main process stdout as they occur.
  2. Configuring subprocess.run to stream the child messages to stdout and to a file can be deceptively challenging.
  3. Python's subprocess.Popen class gives great flexibility for accomplishing the previous task, but we must be aware of the limitation of its methods (e.g. popen.communicate) that might be blocked waiting for the stream of messages to flush.
  4. We can use the PYTHONUNBUFFERED environment variable to make the child process messages (logs and prints) flush every time a new line character is encountered.
Catching subprocess logs visual summary.


Great! Next, complete checkout for full access to pintonista
Welcome back! You've successfully signed in
You've successfully subscribed to pintonista
Success! Your account is fully activated, you now have access to all content
Success! Your billing info has been updated
Your billing was not updated