-
Notifications
You must be signed in to change notification settings - Fork 1.9k
C++: Improve SARIF severity level reporting of extractor diagnostics #6830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++: Improve SARIF severity level reporting of extractor diagnostics #6830
Conversation
I think so, yes. |
|
I don't claim to fully understand all the moving parts here, but this looks like a serious buildup of tech debt. With this PR the distinction between errors and warnings become really blurred and confusing - extractors distinguish these two categories, but then they're suddenly mixed and matched in the QL. It feels like there are some underlying questions that needs to be answered:
|
|
Hi @aschackmull, thank you for your comment and I agree with your concerns. I think most of your comments are best addressed by language teams, however for the following:
I'll note that it would be beneficial to stick with the severity levels that are part of the SARIF spec to avoid the need for consumers of our results to introduce custom support to process the severities of diagnostic messages. The motivation for this change is somewhat of a stopgap solution to address customer concerns. I think the kind of changes that you're proposing to extractors make sense conceptually, but they need to be owned by the language teams, and they will take more time to implement. A proposal that could address customer concerns quickly while addressing some of the problems you identified with the current state of the PR is to reduce the SARIF level of each extractor diagnostic. For example, extractor errors would be mapped to the @aschackmull @yo-h @turbo @AlonaHlobina @adityasharad @calumgrant @jbj What are your thoughts? |
|
For @aschackmull's questions, I'm guessing this is highly language-specific. For C++, I'd say that the extractor errors reported by ExtractionErrors.ql should be seen as warnings from a user perspective, and so the proposed change looks good to me. Just yesterday we had a support escalation caused by a customer misunderstanding the severity. We have another diagnostic query, FailedExtractorInvocations.ql, for the cases where the extractor aborts completely. It turns out this query has no severity column, and I think "error" would be appropriate for this query. |
|
Today, the diagnostic summary for C/C++ looks like this: I'd like the three severities to instead be |
Then at the very least shouldn't some files/queries/predicates be renamed as well? Having a query named |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a user perspective I think it could make sense to treat extractor errors as warnings for Python. (they can't really do anything about it if the problem is on our parser). However, I would like to discuss this with the rest of the Python team, which we will do on Monday afternoon. Leaving a blocked review until then.
|
Do you mean you want @jbj? I'm not sure how much having only 1, rather than 2, instances of "error" would have helped. |
|
Together with renaming files, predicates and metadata as suggested in #6830 (comment), I think the table should become Maybe |
|
Copying from the SARIF spec:
Given the quantity of results it produces, I would weakly suggest that the "Successfully extracted files" diagnostic seems like debug output and therefore should have severity |
I tend to agree with @henrymercer. From the customer's perspective, it will be less confusing. We do not necessarily want them to fix these errors. In many cases, it is not even possible for customers to do something about them. Reducing the |
|
Thanks @jbj @aschackmull @RasmusWL @igfoo @AlonaHlobina for your input. There's a clear way forward for C++, so I'm going to retarget this PR to just C++. For the other languages, it's great to see that the discussion has started on this. I'll hand over making any adjustments to the severity of the extractor diagnostics to you, and let this PR serve as an example for how we made these adjustments for one language. |
f600f56 to
d5c8b50
Compare
|
I'm looking for some databases I can use to test the changes I've made to the C++ extractor diagnostics, since there don't appear to be any QL tests for these queries. @criemen I believe you implemented the bulk of these queries in #5414 — do you have any databases you used for testing that you could send me? Thanks! |
d5c8b50 to
5b26d41
Compare
Try a DCA run on the default repos there? |
|
@henrymercer There's integration tests for these queries I believe? I don't have any DBs handy, sorry. |
|
I had just noticed the CI run and was about to comment — thanks! I'll look into this next week. |
This PR no longer changes Python
|
Checks failure is unrelated. |
jbj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM.
|
I'd like to merge this PR (and its corresponding internal PR), but I'm not allowed to click the button when Checks doesn't pass. I can't see what the error is. I'll try to re-run it. |
|
@adityasharad (or another admin) please could you merge this along with the corresponding internal PR? I'm not able to due to the Checks failure, which is unrelated. Thanks! |
|
@jbj It's a bug which occurs when we trigger the checks job via Qlucie so it uses a custom branch of our internal code. I've linked the internal issue with the bug report above. I don't think the rerun will help, but will be glad to be proven wrong :) |
adityasharad
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internal PR has passed all checks.


The SARIF spec defines errors and warnings as follows:
The goal of this PR is to report extraction errors that in most cases won't break the analysis in a significant way as warnings rather than errors. This helps set the right expectations when these messages appear in the diagnostic data output by the CodeQL Action and CLI.