Implemented DoubledEndedIterator for Ancestors<'_> + added tests and updated doc comments#153239
Implemented DoubledEndedIterator for Ancestors<'_> + added tests and updated doc comments#153239asder8215 wants to merge 2 commits intorust-lang:mainfrom
Conversation
|
rustbot has assigned @Mark-Simulacrum. Use Why was this reviewer chosen?The reviewer was selected based on:
|
This comment has been minimized.
This comment has been minimized.
79234be to
eb5b1fc
Compare
This comment has been minimized.
This comment has been minimized.
eb5b1fc to
26917f2
Compare
This comment has been minimized.
This comment has been minimized.
b436c61 to
094669b
Compare
…updated doc comments
094669b to
ef86174
Compare
| let os_str_path = self.as_os_str(); | ||
| let path_bytes = os_str_path.as_encoded_bytes(); | ||
| let path_len = path_bytes.len(); | ||
| let trailing_seps = if self.has_trailing_sep() { |
There was a problem hiding this comment.
This looks like a partial implementation of trim_trailing_sep, except without the handling for !self.has_root()...
I added an unresolved question to the tracking issue (#142503) on whether that's the right behavior for trim_trailing_sep/has_trailing_sep, but I think this is probably not what we want here...
| @@ -0,0 +1,369 @@ | |||
| use std::ffi::OsStr; | |||
There was a problem hiding this comment.
I would recommend writing a fuzz test (probably easiest to do out of tree) to find bugs, and check in any tests that find issues with the impl here.
With something like this, running with the basic cargo fuzz, I find that Path::new("/.") returns ["/.", "/"] with the impl in this PR while on mainline it returns the expected ["/."] (https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=51b3d00562e464e135446c314b939a6b).
Note that only covers the platform you run on (in particular, Windows isn't as easily covered if you're on Linux), but it should still build some confidence that your code is correct.
#![no_main]
use libfuzzer_sys::fuzz_target;
use std::ffi::OsStr;
use std::os::unix::ffi::OsStrExt;
use std::path::Path;
fuzz_target!(|data: &[u8]| {
let path = Path::new(OsStr::from_bytes(data));
let forward = path.ancestors().collect::<Vec<_>>();
let mut backward = path.ancestors().rev().collect::<Vec<_>>();
backward.reverse();
assert_eq!(forward, backward, "{:?}", path);
let mut by_parents = vec![];
let mut parent = path;
loop {
by_parents.push(parent);
parent = if let Some(p) = parent.parent() {
p
} else {
break;
};
}
assert_eq!(forward, by_parents, "{:?}", path);
});
|
Reminder, once the PR becomes ready for a review, use |
|
I ran the fuzz script you made after making some changes to the code, and it's been running without errors for ~20 minutes. I'm pretty confident that the code for all the |
|
Re-running the fuzz script for me with what I believe is latest on this branch fails with: i.e., forward returns Can you confirm whether that's the case and perhaps spend some more time fuzzing? The failure is near-instant for me, though I did modify the fuzzing target a bit: #![no_main]
use libfuzzer_sys::fuzz_target;
use std::ffi::OsStr;
use std::os::unix::ffi::OsStrExt;
use std::path::{Path, PathBuf};
fuzz_target!(|path: PathBuf| {
let path: &Path = &path;
let forward = path.ancestors().collect::<Vec<_>>();
let mut backward = path.ancestors().rev().collect::<Vec<_>>();
backward.reverse();
assert_eq!(forward, backward, "{:?}", path);
let mut by_parents = vec![];
let mut parent = path;
loop {
by_parents.push(parent);
parent = if let Some(p) = parent.parent() {
p
} else {
break;
};
}
assert_eq!(forward, by_parents, "{:?}", path);
}); |
Ah shoot. I'm noticing that I forgot to copy over updating the rebound index in the b'.' comparison conditional (!curr_dir_present branch) within I copied over the necessary |
…trailing) in Ancestors<'_>
ad2dc06 to
1a0f437
Compare
|
|
Okay, so I copied over the I did try to see if I could get rid of I'm okay with the existing code here for now, and I wouldn't mind coming back to it later to see if I could remove that function (if you are okay with that). On a separate note, there technically is duplicate code between Lastly, I still haven't set up my Windows 10 VM to run the rust test suite, so I may need some assistance testing for that. I tried installing |
|
Forgot to do this: |
This PR revolves around the tracking issue for implementing
DoubleEndedIteratorforAncestors<'_>.I tried to minimize the information needed in
Ancestors<'_>struct to effectively produce the correct backward and forward paths using a front and back index; I did opt to use a&'a [u8]over&'a Pathbecause it would reduce code in.next()/.next_back()where I wouldn't need to constantly convert the Path into a u8 slice. Due to how relative paths worked previously forAncestors<'_>, I had to introduce logic that allowed for returning an empty string at the start/end of a relative path for.next_back()/.next(); I'm open to a lot of feedback on how to make the code neater to address these cases.I tested this pretty extensively on my Linux machine (comparing it with the output of what
Ancestors<'_>previously returned); I do have to do more testing for Windows (been struggling a bit to get my Windows VM to run my shared folder containing the rust project through./xthough, so I may need some help on this), which there are some windows-specific test intests/path_ancestors.rs. As far as I'm aware as well, the create_dir_all tests, which does useAncestors<'_>underneath the hood, all passes.I am aware of one breaking change in this code though. I realized for a path like:
Its
Ancestor<'_>iterator contains:The previous implementation doesn't strip trailing slashes for the first path, but in subsequent paths it does eliminate trailing slashes. You can see this here.
With how I use ascii separators to determine that we've reached the next component, this does make it a bit difficult to find a neat way to handle this logic (and do this symmetrically as well for
.next_back()). For example, consider the two paths:With how I'm using a back index approach, I save what the current back idx is in a variable (curr_back), then iterate through the u8 slice until I hit an ascii separator, advance more through potential ascii separators, and then stop reading from there if there are no more adjacent ascii separators, returning what
path[..curr_back]is (trimming separators). Doing this on the first path would return "/foo/bar/baz" in the first call (since I trim separators, but otherwise this would be "/foo/bar/baz/") and then return "/foo/bar/baz" again in the next call (this is because curr_back is set to the idx of where the last "/" is at in "/foo/bar/baz/"). Doing this for the second path would return "/foo/bar/baz" in the first call and then "/foo/bar" in the next call. I currently just trim separators from the original path before giving that to theAncestors<'_>struct when.ancestors()is called.I'm hoping this change isn't room for big concerns because removing trailing separators for the original path does not break logic in accessing that path. However, if someone were to use
Ancestors<'_>for printing paths recursively up, this would cause changes to the first path, which I do not see it being a huge issue if they want to print the original path unchanged since they can opt to consume the first item in the iterator and print the original path. If I do, however, need to account for this logic, then I will try and see what I can do; I'm open to feedback in handling this case as well.UPDATE
I added a bit more information into
Ancestors<'_>(path_lenandtrailing_seps) that provides more information to the iterator so that we can print out the original path on the first call to.next()or when we reach the last component for.next_back(). Unfortunately, this does meanancestors()becomes aO(N)function if we just have ausize::MAXtrailing root path, but that's really irregular and rare to see.The way
Ancestors<'_>is implemented now operates the same as the previous version (at least on Linux/Unix, I still have to test on Windows to see if Prefix components are okay).