-
Notifications
You must be signed in to change notification settings - Fork 1.9k
C++: Model classes in StdString.qll. #4750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Does it also not show any difference in the tuple counts? On Slack you mentioned having to do an |
|
Fixed merge (conflict with #4822). |
I think the two options are equivalent in performance, and I think I prefer the |
|
To move this PR forward, I'd like to see some tuple counts before/after for the predicates being changed. |
Yep, I've been meaning to produce some but it's not quite trivial enough to fit in while I'm doing other work. |
cpp/ql/src/semmle/code/cpp/models/implementations/StdString.qll
Outdated
Show resolved
Hide resolved
|
I agree with @geoffw0 that |
|
It looks like the class models have the same The optimizer is using magic sets and sharing with I'd advocate defining it centrally rather than adding a separate rootdef in each class model. Uses seem to be optimized correctly: With this definition of Class getClassForMember(string name) {
(
result = this.getDeclaringType()
or
result.(TemplateClass).getAnInstantiation() = this.getDeclaringType()
) and
this.getName() = name
} |
…es or 'any'. The classes for string objects now match instantiations directly rather than the template.
|
I've just pushed a significant change, so that it's now the instantiations of library classes which are modelled, rather than the templates. This gives us much more natural code in the function models, without any weird custom predicates ( |
|
There are a few predicates that are still doing a full scan of the function table. All of them are joining single strings against the function name, but not all such predicates do a full scan. I haven't noticed any other common factors. There's a few cases in And some cases that look scary, but are just using the Cartesian product to join |
|
Both |
… well, for consistency.
|
I've just pushed five commits containing two performance improvements, designed by myself and @jbj: (1) (2) even with the first change we still get a mediocre evaluation strategy, where we narrow down to all of the functions in the modelled class before joining on name. This is the 'wrong' order, producing a fair number of rows we don't need. The Notes:
Query used for most of the testing: Looking at With the first commit only ( and after: There are similar results in DataFlow, e.g. for |
jbj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these changes LGTM, and I witnessed most of the tuple-count improvements in our screen sharing session yesterday. Even if we should run into performance problems when expanding these changes to the rest of the library, I think the getClassAndName approach is flexible enough that most performance problems could be addressed centrally, without changing every single model.
MathiasVP
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as well now!
|
Thanks for the close collaboration, reviews and merge everyone! |


This PR represents a prototype for what I'm thinking of doing for https://github.com/github/codeql-c-analysis-team/issues/167. The problem with it is that the changes I've made so far don't add up to a measurable difference to performance in my testing. And I'm not aware of a specific concern in the query log to focus in on there. So I'm left a little unconvinced that making these changes more widely will be worthwhile.
@jbj has previously expressed a preference where I've written
this = any(StdBasicString s).getAnInstMemberNamed("push_back"), forthis.getClassForMember("push_back") instanceof StdBasicStringinstead. I think that's for performance reasons but I don't currently understand why - and I find it significantly more difficult to read.getClassForMemberreally meansgetClassAndAlsoIAmAMemberCalled(name).