close
Skip to content

Implemented DoubledEndedIterator for Ancestors<'_> + added tests and updated doc comments#153239

Open
asder8215 wants to merge 2 commits intorust-lang:mainfrom
asder8215:reverse_ancestor_acp
Open

Implemented DoubledEndedIterator for Ancestors<'_> + added tests and updated doc comments#153239
asder8215 wants to merge 2 commits intorust-lang:mainfrom
asder8215:reverse_ancestor_acp

Conversation

@asder8215
Copy link
Copy Markdown
Contributor

@asder8215 asder8215 commented Mar 1, 2026

This PR revolves around the tracking issue for implementing DoubleEndedIterator for Ancestors<'_>.

I tried to minimize the information needed in Ancestors<'_> struct to effectively produce the correct backward and forward paths using a front and back index; I did opt to use a &'a [u8] over &'a Path because it would reduce code in .next()/.next_back() where I wouldn't need to constantly convert the Path into a u8 slice. Due to how relative paths worked previously for Ancestors<'_>, I had to introduce logic that allowed for returning an empty string at the start/end of a relative path for .next_back()/.next(); I'm open to a lot of feedback on how to make the code neater to address these cases.

I tested this pretty extensively on my Linux machine (comparing it with the output of what Ancestors<'_> previously returned); I do have to do more testing for Windows (been struggling a bit to get my Windows VM to run my shared folder containing the rust project through ./x though, so I may need some help on this), which there are some windows-specific test in tests/path_ancestors.rs. As far as I'm aware as well, the create_dir_all tests, which does use Ancestors<'_> underneath the hood, all passes.

I am aware of one breaking change in this code though. I realized for a path like:

let mut path = Path::new("/foo/bar/baz/").ancestors();

Its Ancestor<'_> iterator contains:

"/foo/bar/baz/"
"/foo/bar"
"/foo"
"/"
 (Then it's None afterward)

The previous implementation doesn't strip trailing slashes for the first path, but in subsequent paths it does eliminate trailing slashes. You can see this here.

With how I use ascii separators to determine that we've reached the next component, this does make it a bit difficult to find a neat way to handle this logic (and do this symmetrically as well for .next_back()). For example, consider the two paths:

"/foo/bar/baz/"
"/foo/bar/baz"

With how I'm using a back index approach, I save what the current back idx is in a variable (curr_back), then iterate through the u8 slice until I hit an ascii separator, advance more through potential ascii separators, and then stop reading from there if there are no more adjacent ascii separators, returning what path[..curr_back] is (trimming separators). Doing this on the first path would return "/foo/bar/baz" in the first call (since I trim separators, but otherwise this would be "/foo/bar/baz/") and then return "/foo/bar/baz" again in the next call (this is because curr_back is set to the idx of where the last "/" is at in "/foo/bar/baz/"). Doing this for the second path would return "/foo/bar/baz" in the first call and then "/foo/bar" in the next call. I currently just trim separators from the original path before giving that to the Ancestors<'_> struct when .ancestors() is called.

I'm hoping this change isn't room for big concerns because removing trailing separators for the original path does not break logic in accessing that path. However, if someone were to use Ancestors<'_> for printing paths recursively up, this would cause changes to the first path, which I do not see it being a huge issue if they want to print the original path unchanged since they can opt to consume the first item in the iterator and print the original path. If I do, however, need to account for this logic, then I will try and see what I can do; I'm open to feedback in handling this case as well.

UPDATE

I added a bit more information into Ancestors<'_> (path_len and trailing_seps) that provides more information to the iterator so that we can print out the original path on the first call to .next() or when we reach the last component for .next_back(). Unfortunately, this does mean ancestors() becomes a O(N) function if we just have a usize::MAX trailing root path, but that's really irregular and rare to see.

The way Ancestors<'_> is implemented now operates the same as the previous version (at least on Linux/Unix, I still have to test on Windows to see if Prefix components are okay).

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 1, 2026
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Mar 1, 2026

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: @ChrisDenton, libs
  • @ChrisDenton, libs expanded to 8 candidates
  • Random selection from Mark-Simulacrum, joboet

@rust-log-analyzer

This comment has been minimized.

@asder8215 asder8215 force-pushed the reverse_ancestor_acp branch from 79234be to eb5b1fc Compare March 1, 2026 03:32
@rust-log-analyzer

This comment has been minimized.

@asder8215 asder8215 force-pushed the reverse_ancestor_acp branch from eb5b1fc to 26917f2 Compare March 1, 2026 03:43
@rust-log-analyzer

This comment has been minimized.

@asder8215 asder8215 force-pushed the reverse_ancestor_acp branch 2 times, most recently from b436c61 to 094669b Compare March 1, 2026 18:01
Comment thread library/std/src/path.rs Outdated
let os_str_path = self.as_os_str();
let path_bytes = os_str_path.as_encoded_bytes();
let path_len = path_bytes.len();
let trailing_seps = if self.has_trailing_sep() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a partial implementation of trim_trailing_sep, except without the handling for !self.has_root()...

I added an unresolved question to the tracking issue (#142503) on whether that's the right behavior for trim_trailing_sep/has_trailing_sep, but I think this is probably not what we want here...

@@ -0,0 +1,369 @@
use std::ffi::OsStr;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend writing a fuzz test (probably easiest to do out of tree) to find bugs, and check in any tests that find issues with the impl here.

With something like this, running with the basic cargo fuzz, I find that Path::new("/.") returns ["/.", "/"] with the impl in this PR while on mainline it returns the expected ["/."] (https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=51b3d00562e464e135446c314b939a6b).

Note that only covers the platform you run on (in particular, Windows isn't as easily covered if you're on Linux), but it should still build some confidence that your code is correct.

#![no_main]

use libfuzzer_sys::fuzz_target;
use std::ffi::OsStr;
use std::os::unix::ffi::OsStrExt;
use std::path::Path;

fuzz_target!(|data: &[u8]| {
    let path = Path::new(OsStr::from_bytes(data));
    let forward = path.ancestors().collect::<Vec<_>>();
    let mut backward = path.ancestors().rev().collect::<Vec<_>>();
    backward.reverse();
    assert_eq!(forward, backward, "{:?}", path);

    let mut by_parents = vec![];
    let mut parent = path;
    loop {
        by_parents.push(parent);
        parent = if let Some(p) = parent.parent() {
            p
        } else {
            break;
        };
    }

    assert_eq!(forward, by_parents, "{:?}", path);
});

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 21, 2026
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Mar 21, 2026

Reminder, once the PR becomes ready for a review, use @rustbot ready.

@asder8215
Copy link
Copy Markdown
Contributor Author

I ran the fuzz script you made after making some changes to the code, and it's been running without errors for ~20 minutes. I'm pretty confident that the code for all the Ancestors<'_> iterator next and next_back captures the behavior of the original Ancestors<'_> next and shows symmetrical behavior for next_back.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 25, 2026
@Mark-Simulacrum
Copy link
Copy Markdown
Member

Re-running the fuzz script for me with what I believe is latest on this branch fails with:

thread '<unnamed>' (8934365) panicked at fuzz_targets/fuzz_target_1.rs:13:5:
assertion `left == right` failed: "/.."
  left: ["/..", "/"]
 right: ["/.."]

i.e., forward returns ["/..", "/"] while a reversed backwards iteration returns ["/.."].

Can you confirm whether that's the case and perhaps spend some more time fuzzing? The failure is near-instant for me, though I did modify the fuzzing target a bit:

#![no_main]

use libfuzzer_sys::fuzz_target;
use std::ffi::OsStr;
use std::os::unix::ffi::OsStrExt;
use std::path::{Path, PathBuf};

fuzz_target!(|path: PathBuf| {
    let path: &Path = &path;
    let forward = path.ancestors().collect::<Vec<_>>();
    let mut backward = path.ancestors().rev().collect::<Vec<_>>();
    backward.reverse();
    assert_eq!(forward, backward, "{:?}", path);

    let mut by_parents = vec![];
    let mut parent = path;
    loop {
        by_parents.push(parent);
        parent = if let Some(p) = parent.parent() {
            p
        } else {
            break;
        };
    }

    assert_eq!(forward, by_parents, "{:?}", path);
});

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 27, 2026
@asder8215
Copy link
Copy Markdown
Contributor Author

i.e., forward returns ["/..", "/"] while a reversed backwards iteration returns ["/.."].

Ah shoot. I'm noticing that I forgot to copy over updating the rebound index in the b'.' comparison conditional (!curr_dir_present branch) within advance_through_trailing_sep_front function.

I copied over the necessary Ancestor code in a separate folder, cargo fuzzed it, and updated the code as I saw issues pop up from cargo fuzz until I didn't see anything for a while. I guess I missed a couple updates I made from the fuzzed code to this PR code; I'll fix this later tonight once I'm back on my PC.

@asder8215 asder8215 force-pushed the reverse_ancestor_acp branch from ad2dc06 to 1a0f437 Compare March 27, 2026 23:15
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Mar 27, 2026

⚠️ Warning ⚠️

@asder8215
Copy link
Copy Markdown
Contributor Author

asder8215 commented Mar 27, 2026

Okay, so I copied over the Ancestors code I wrote in the fuzz script properly into this PR and diffed it. I don't see any issues running cargo fuzz; I made sure to run it longer, but it didn't produce any errors as far as I can see. Let me know if there are any errors you see on your end running cargo fuzz.

I did try to see if I could get rid of trailing_path_length() within Path::ancestors, so I could make that function effectively O(1) instead of possibly O(N) (though this worst case scenario is very unlikely to occur unless someone for some reason does trailing slashes, i.e. //////, only or trailing current directory path only, i.e. /././././.), but it's proving to be difficult using a front and back index approach as I mentioned earlier in my PR description.

I'm okay with the existing code here for now, and I wouldn't mind coming back to it later to see if I could remove that function (if you are okay with that).

On a separate note, there technically is duplicate code between Ancestors::advance_through_trailing_sep_back and trailing_path_length (and technically Ancestors::advance_through_trailing_sep_front, but that uses front index) on traversing backward in the path to strip away trailing seps + curr dir components. Do you want me to put that in a common function or is it alright as is?

Lastly, I still haven't set up my Windows 10 VM to run the rust test suite, so I may need some assistance testing for that. I tried installing cargo-fuzz to see if I could just extract the Ancestors rust code and run the fuzz script you made, but it's telling me that it had an error calling dlltool.exe in which it couldn't find the program (hence couldn't compile window-sys).

@asder8215
Copy link
Copy Markdown
Contributor Author

Forgot to do this:
@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants