close
Skip to content

Add support for source() directive in symbol resolution#1157

Open
lionel- wants to merge 10 commits intooak/namespacefrom
oak/source
Open

Add support for source() directive in symbol resolution#1157
lionel- wants to merge 10 commits intooak/namespacefrom
oak/source

Conversation

@lionel-
Copy link
Copy Markdown
Contributor

@lionel- lionel- commented Apr 16, 2026

Branched from #1154

Closes #1148
Progress towards posit-dev/positron#11112

Adds support for source() directives in scripts, allowing for cross-file symbol resolution. Like in RStudio, sourced files are resolved from the workspace directory. Effects on the LSP:

  • Top-level definitions in the sourced files are made available below the source() call.
  • Side effects from top-level library() and require() calls are also made available after the source() call.
  • Sourced files can transitively source other files.

For now this only affects goto-definition, but should later also affect diagnostics such as unused symbols.

Both source() and library() are allowed in lazy nested scopes such as function bodies. In that case the side effects are available inside that scope (and nested children), but not outside, even though the effects are global. That's because outside that scope, code can't make assumptions about the availability of the symbols.

With a helpers.R file containing:

foo <- function() {}
library(dplyr)

We have at top-level:

# Both undefined
print(foo)
print(mutate)

source("helpers.R")

# Now defined
print(foo)
print(mutate)

In nested scopes:

# Both undefined
print(foo)
print(mutate)

function() {
  source("helpers.R")
  # Now defined
  print(foo)
  print(mutate)
}

# Still undefined
print(foo)
print(mutate)

At top-level, it doesn't matter whether local is true or false, the sourced definitions overwrite any local ones. We have a new DefinitionKind::Sourced variant for these definitions:

foo <- "local"
source("helpers.R")
print(foo) # Resolves to helper's definition

In nested scopes, if local = FALSE, the sourced definitions are made available via cross-file directives instead, that are only used for unbound symbols. This allows proper resolution of local symbols:

function() {
  foo <- 1
  source("helpers.R")
  print(foo) # Still refers to the local definition
}

If local is true, sourced definitions are instead added to the local scope.

Copy link
Copy Markdown
Contributor

@DavisVaughan DavisVaughan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something feels a little off here but I can't quite put my finger on it. Like, concretely I am weirded out by the fact that our definitions table can include a range into a different file. That somehow feels like we aren't quite modeling that correctly.

It seems like one other way you could model this is:

  • SemanticIndex has a dependencies: Vec<Dependency> field
  • Dependency { range: Range, index_id: SemanticIndexId }

The idea being that when you encounter a source("bar.R") call while building the semantic index for foo.R, you build the semantic index for bar.R and then store it somewhere (im hand waving here about where it is stored, maybe a Salsa db?) without modification or simplification. And then you reference it from foo.R's semantic index by ID as being included at some Range in the file (the location of the source("bar.R") call).

Then your methods on SemanticIndex would have to know how to layer in dependencies when you ask for things like file_all_definitions(), and maybe even also for things as simple as SemanticIndex::symbols(), BUT I think that Salsa would probably help with that, because it would be computed once and then as long as the file underlying SemanticIndexId doesn't change, then symbols() just returns a cached result.

It also means it would be quite easy for us to limit symbols() to only return foo.R's direct symbols if we wanted to. Right now it gets blurred with the ones from bar.R.


I think the biggest source of weirdness in my head right now is that the model for SemanticIndex gets a bit blurry with this PR. It used to be that SemanticIndex modeled symbols and definitions found in exactly one file. Now SemanticIndex::symbol_tables and SemanticIndex::definitions can cross files and that feels quite weird to me, and breaks my mental model of how to use this thing.

If SemanticIndex instead just referenced another semantic index at some Range, then that feels more correct again, because its own internal symbol_tables and definitions would still be only about itself.

I think that would also make it much easier to update if, say, bar.R is updated by the user. You get to change ABSOLUTELY NOTHING about foo.R's SemanticIndex object. All of the foo.R fields are still completely valid because nothing in foo.R changed, and the fields only model foo.R. And then the reference to bar.R's SemanticIndex would pull in the updated symbols/definitions from bar.R post-change, but Salsa would help us know when it gets invalidated.

Comment on lines +102 to +109
fn workspace_document(&self, uri: &Url) -> Option<Document> {
if let Some(doc) = self.documents.get(uri) {
return Some(doc.clone());
}
let path = uri.to_file_path().log_err()?;
let contents = std::fs::read_to_string(&path).log_err()?;
Some(Document::new(&contents, None))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about getting in too deep with something that clones an already open document on every retrieval. That sounds like a nasty performance hit.

It definitely feels like we want to return a reference here, we rarely if ever want to own the Document at the use site

I realize that makes things difficult due to the create-the-doc-on-the-fly path. I guess we really need the VFS as a place to "put" the document that we can then return a ref from.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh can we Cow the Document in the meantime?

Comment on lines +181 to 183
let Some(doc) = self.workspace_document(&uri) else {
continue;
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea see we are trading a ref for an owned Document here, which doesn't feel great to me

Comment on lines +205 to +208
let file_dir = file
.to_file_path()
.ok()
.and_then(|p| p.parent().map(|d| d.to_path_buf()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is eagerly computing file_dir when it isn't required most of the time

pub struct SourceResolution {
/// Definitions to inject as synthetic bindings in the calling scope.
/// Each entry is (name, file_url, range_in_source_file).
pub definitions: Vec<(String, Url, TextRange)>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SourceResolutionDefinition might be a reasonable struct here (if i feel like i need to document the fields, i feel like i just need a struct)

Comment on lines +247 to +251
if !stack.insert(url.clone()) {
return None;
}

let sourced_doc = self.workspace_document(&url)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This early exit feels like a problem

  • You add to stack
  • Doc doesn't exist, so workspace_document bails early
  • You never remove from the stack

In general it feels like we need to ensure the stack is popped on any exit from this function

Comment on lines +515 to +519
/// Injected from a `source()` call. The definition lives in an external
/// file; `range` on the `Definition` gives the name's range in that file.
Sourced {
file: Url,
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This somehowwww feels wrong to me. Like it needed to be a ExternalDefinition struct that allowed for this

I have somewhat strong gut feelings that a Definition should not be allowed to jump across files, but I can't quite explain why

/// `source(file)`: brings exports from another file into scope.
Source {
file: Url,
exports: HashMap<String, TextRange>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it somewhat interesting that the package exports are "lazy" here but the source exports are not

Comment on lines 732 to 733

// For now, only recognise exactly one unnamed argument. We'll do
// argument matching later (`character.only` unquoting is another
// complication).
let Some(Ok(first_arg)) = items.next() else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably keep this

Comment on lines +910 to +917
/// Call the source resolver for `path`, temporarily taking it out of
/// `self` to avoid borrow conflicts.
fn resolve_source(&mut self, path: &str) -> Option<SourceResolution> {
let mut resolver = self.source_resolver.take()?;
let result = resolver(path);
self.source_resolver = Some(resolver);
result
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Call the source resolver for `path`, temporarily taking it out of
/// `self` to avoid borrow conflicts.
fn resolve_source(&mut self, path: &str) -> Option<SourceResolution> {
let mut resolver = self.source_resolver.take()?;
let result = resolver(path);
self.source_resolver = Some(resolver);
result
}
fn resolve_source(&mut self, path: &str) -> Option<SourceResolution> {
let source_resolver = self.source_resolver.as_mut()?;
source_resolver(path)
}

Comment on lines +870 to +875
for (name, file, range) in resolution.definitions {
self.add_definition(
&name,
SymbolFlags::IS_BOUND,
DefinitionKind::Sourced { file },
range,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something that throws me off is that when you collect a source() directive, the range doesn't correspond to the range of source(), which feels wrong to me for ordering things like

source("helpers.R") # has fn()
fn <- function() {}

Like, IIUC the "definition" underlying source() no longer knows where it originated from in the file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants