Understanding O_PATH File Descriptors in Linux
🔍 Introduction
Imagine you’re building a secure file upload and validation system for a continuous integration (CI) platform. After a user pushes a repository or artifact archive, your backend unpacks it into a temporary directory. Before any build or test can begin, you need to verify the contents to prevent abuse:
- Ensure all files are confined to the extracted directory (no path traversal tricks like
../../etc/passwd
). - Check metadata for auditing (e.g., file ownership, timestamps).
- Reject symlinks or hardlinks that point outside the extraction root (e.g., to
/dev/null
, or/home
). - Prevent tampering between inspection and usage (i.e., close TOCTOU holes).
- Detect and block references to disallowed paths like
/proc
,/sys
, or/dev
.
At first glance, this might seem doable with familiar system calls like stat()
, lstat()
, or realpath()
. But these functions operate on entire path strings, and once resolution is complete, it’s too late to detect whether any part of the path traversed symlinks or escaped a restricted root. Worse, they’re prone to time-of-check vs. time-of-use (TOCTOU) attacks — where a malicious user can replace or modify a symlink between inspection and use.
This is where O_PATH
, combined with openat2()
or the older *at()
family, shines.
With O_PATH
, you can open a file just to refer to its path, without gaining permission to read or write it. Combined with openat2()
’s strict resolution flags like RESOLVE_NO_SYMLINKS
and RESOLVE_BENEATH
, you can securely walk the directory tree one component at a time — validating structure, checking metadata, and enforcing sandbox boundaries, all without opening file contents.
This approach is used in production by tools like runc
, systemd
, and container runtimes to safely traverse filesystems in privilege-separated environments.
⚠️ Naive Approach: Vulnerable File Walker in C
Let’s say we want to recursively scan a user-uploaded directory, checking each file’s stat() info and rejecting any symlinks.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <dirent.h>
#include <string.h>
#include <unistd.h>
void scan_dir(const char *path) {
DIR *dir = opendir(path);
if (!dir) {
perror("opendir");
return;
}
struct dirent *entry;
char fullpath[4096];
while ((entry = readdir(dir)) != NULL) {
// Skip "." and ".."
if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0)
continue;
snprintf(fullpath, sizeof(fullpath), "%s/%s", path, entry->d_name);
struct stat st;
if (lstat(fullpath, &st) == -1) {
perror("lstat");
continue;
}
// Check for symlinks
if (S_ISLNK(st.st_mode)) {
fprintf(stderr, "Rejected symlink: %s\n", fullpath);
continue;
}
// Recursively scan directories
if (S_ISDIR(st.st_mode)) {
scan_dir(fullpath);
} else {
printf("Accepted: %s\n", fullpath);
}
}
closedir(dir);
}
int main(int argc, char **argv) {
if (argc != 2) {
fprintf(stderr, "Usage: %s <path>\n", argv[0]);
return 1;
}
scan_dir(argv[1]);
return 0;
}
This looks like it checks for symlinks and avoids them. But it’s vulnerable.
🔓 Exploiting the Race Condition
Suppose your program is scanning /tmp/ci-job/uploads/abc/, and a malicious user includes a directory structure like this:
1
2
3
4
evil/
├── legit/
│ └── file.txt
└── legit -> /etc
Here’s how the attack works:
- The user uploads
evil/legit/
as a normal directory. - Right after
scan_dir()
opensevil/legit
, the attacker swaps it out (via a background thread or rapid timing) with a symlink:
1
rm -rf evil/legit && ln -s /etc evil/legit
- When
scan_dir()
reaches intoevil/legit/
,lstat()
sees the original directory — but when it callsopendir()
and walks it, it’s now actually walking/etc
.
The result: the scanner accepts and prints arbitrary paths inside /etc
.
This is a textbook TOCTOU (time-of-check vs. time-of-use) vulnerability.
✅ Secure File Walker with openat2()
and O_PATH
This example:
- Walks a directory tree starting from a trusted base fd.
- Uses
openat2()
withRESOLVE_BENEATH | RESOLVE_NO_SYMLINKS
. - Avoids ever working with full path strings.
🛠️ Requirements
This requires:
- Linux ≥ 5.6 (for
openat2()
) - A libc that exposes
openat2()
or use raw syscall
For portability, we’ll use raw syscall here.
📄 Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <linux/openat2.h>
#include <sys/syscall.h>
#include <dirent.h>
#include <unistd.h>
#include <sys/stat.h>
int openat2_wrapper(int dirfd, const char *pathname, struct open_how *how, size_t size) {
return syscall(SYS_openat2, dirfd, pathname, how, size);
}
void scan_dir(int dirfd, const char *rel_path) {
int newfd;
// Set up secure openat2() parameters
struct open_how how = {
.flags = O_PATH | O_NOFOLLOW,
.resolve = RESOLVE_BENEATH | RESOLVE_NO_SYMLINKS
};
newfd = openat2_wrapper(dirfd, rel_path, &how, sizeof(how));
if (newfd == -1) {
perror("openat2");
return;
}
// Re-open for listing
int dfd = openat(newfd, ".", O_RDONLY | O_DIRECTORY | O_CLOEXEC);
if (dfd == -1) {
perror("open directory");
close(newfd);
return;
}
DIR *dir = fdopendir(dfd);
if (!dir) {
perror("fdopendir");
close(dfd);
close(newfd);
return;
}
struct dirent *entry;
while ((entry = readdir(dir)) != NULL) {
if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0)
continue;
printf("Found: %s/%s\n", rel_path, entry->d_name);
// Recurse into subdirectories
if (entry->d_type == DT_DIR) {
char sub_path[PATH_MAX];
if (snprintf(sub_path, sizeof(sub_path), "%s/%s", rel_path, entry->d_name) >= sizeof(sub_path)) {
fprintf(stderr, "Path too long: %s/%s\n", rel_path, entry->d_name);
continue;
}
scan_dir(newfd, sub_path);
}
}
closedir(dir);
close(newfd);
}
🧪 Example Usage
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
int main(int argc, char **argv) {
if (argc != 2) {
fprintf(stderr, "Usage: %s <base-path>\n", argv[0]);
return 1;
}
int basefd = open(argv[1], O_PATH | O_DIRECTORY);
if (basefd == -1) {
perror("open base dir");
return 1;
}
scan_dir(basefd, ".");
close(basefd);
return 0;
}
🔐 What This Protects Against
- ✅ Escaping the base directory (../../etc) → blocked by RESOLVE_BENEATH
- ✅ Symlink tricks → blocked by RESOLVE_NO_SYMLINKS
- ✅ Race conditions (TOCTOU) → avoided by resolving each component via fd
🧠 Understanding O_PATH
File Descriptors
O_PATH
lets you open a file only for path resolution, not access. It’s like saying: “I don’t want to read or write, I just want a reference.”
✅ What You Can Do with an O_PATH
fd:
fstat()
,fstatat()
openat()
(for children)readlinkat()
renameat()
- Inspect via
/proc/self/fd/
❌ What You Cannot Do:
read()
,write()
,mmap()
→ returnEBADF
execveat()
→ not supported
Example:
1
2
int fd = open("/etc/passwd", O_PATH);
read(fd, buf, 100); // EBADF
But you can:
1
readlink /proc/self/fd/3 # shows the real path
🏗️ O_PATH
in the Real World: Tools That Rely on It
The O_PATH
file descriptor may feel esoteric, but it plays a critical role in modern Linux software — especially in areas like container runtimes, service managers, and sandboxing. Here are some concrete examples:
🐳 runc
and Container Runtimes
runc
, the reference implementation of the Open Container Initiative (OCI), uses O_PATH
heavily when setting up container mount namespaces. Why?
- It uses
O_PATH
to safely traverse and bind-mount host files and directories into the container’s rootfs. - By opening components with
O_PATH
and resolving paths only relative to trusted descriptors,runc
ensures that container path traversal cannot escape sandbox boundaries or follow unexpected symlinks. - This guards against malicious symlink or race-based filesystem exploits during container setup.
Relevant code in libcontainer
that uses O_PATH
.
⚙️ systemd
: Secure Unit File Handling
systemd
uses O_PATH
in several contexts, particularly when:
- Tracking units that reference files which may not need to be opened or read, but must be monitored.
- Walking paths in restricted mount namespaces, where certain files should not be opened in the traditional sense.
- Mount propagation and
PrivateMounts
options in unit files benefit from resolving mount targets safely viaO_PATH
.
It allows systemd
to manage filesystem objects without exposing unnecessary capabilities.
🛡️ bubblewrap
: Sandboxing for Flatpak and More
bubblewrap
, a minimalistic userspace sandboxing tool (used by Flatpak), leverages O_PATH
to:
- Securely resolve and isolate mount points.
- Prevent symlink tricks during sandbox environment construction.
- Traverse and manipulate filesystem state without touching file contents.
This is critical for sandboxing untrusted applications and ensuring that even during setup, they can’t influence host filesystem state.
🔎 Why All These Tools Use O_PATH
- They operate in security-sensitive environments (e.g., containers, service boundaries).
- They manipulate filesystem state without requiring I/O access.
- They must walk user-controlled paths safely — a classic source of bugs.
- They want to minimize privilege while retaining full control over where operations happen.
📞 Conclusion
O_PATH
gives you a capability-style reference to a file path — without opening the file. With openat2()
and flags like RESOLVE_NO_SYMLINKS
, you get precise, secure path traversal:
- No symlink tricks
- No path escapes
- No race conditions
It’s how modern container runtimes and sandboxing tools handle filesystem traversal securely.
If you care about safe path handling in untrusted environments, you should care about O_PATH
.