Replace tibbles with data frames to improve performance by IndrajeetPatil · Pull Request #1007 · r-lib/styler

IndrajeetPatil · 2022-09-26T11:08:26Z

~~Need to use continuous benchmarking, so not converting this to a draft.~~

codecov-commenter · 2022-09-26T11:12:49Z

Codecov Report

Merging #1007 (fa98f9c) into main (1f4437b) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head fa98f9c differs from pull request most recent head 94e30f8. Consider uploading reports for the commit 94e30f8 to get more accurate results

@@           Coverage Diff           @@
##             main    #1007   +/-   ##
=======================================
  Coverage   91.14%   91.14%           
=======================================
  Files          46       46           
  Lines        2664     2665    +1     
=======================================
+ Hits         2428     2429    +1     
  Misses        236      236

Impacted Files	Coverage Δ
R/nested-to-tree.R	`92.85% <ø> (ø)`
R/style-guides.R	`99.43% <ø> (ø)`
R/stylerignore.R	`100.00% <ø> (ø)`
R/token-define.R	`66.66% <ø> (ø)`
R/ui-styling.R	`100.00% <ø> (ø)`
R/compat-dplyr.R	`92.85% <100.00%> (ø)`
R/compat-tidyr.R	`100.00% <100.00%> (ø)`
R/nest.R	`100.00% <100.00%> (ø)`
R/parse.R	`88.09% <100.00%> (ø)`
R/token-create.R	`96.92% <100.00%> (ø)`
... and 3 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

github-actions · 2022-09-26T11:42:16Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 7c691c9 is merged into main:

❗🐌cache_applying: 27.1ms -> 31.6ms [+15.61%, +16.95%]
:rocket:cache_recording: 1.26s -> 875ms [-31.19%, -30%]
:rocket:without_cache: 3.33s -> 2.2s [-34.28%, -33.43%]

Further explanation regarding interpretation and methodology can be found in the documentation.

IndrajeetPatil · 2022-09-26T11:47:25Z

@lorenzwalthert, @krlmlr That's quite the bump in performance when switching to data frames instead of tibbles as our data structure of choice! 😮

lorenzwalthert · 2022-09-26T16:04:18Z

Wow yes @IndrajeetPatil and without much code change even! Well done. Now only hurdle is to make it pass on old releases...

MichaelChirico · 2022-09-26T16:29:03Z

wow, quite impressive speed improvement per LoC change!!

krlmlr

Nice speedup! Can we encapsulate the choice of data structure in helper functions? We could add e.g. new_styler_df() and styler_df() that use vctrs::new_data_frame() and vctrs::data_frame() under the hood. This means that we could later change the underlying data structure with lesser effort.

Do we still need to import tibble?

github-actions · 2022-09-26T16:41:49Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if 63a27d0 is merged into main:

❗🐌cache_applying: 37.9ms -> 41.6ms [+6.86%, +12.7%]
:rocket:cache_recording: 1.91s -> 1.28s [-34.03%, -31.47%]
:rocket:without_cache: 5.24s -> 3.33s [-37.36%, -35.55%]

Further explanation regarding interpretation and methodology can be found in the documentation.

IndrajeetPatil · 2022-09-27T05:44:04Z

Nice speedup! Can we encapsulate the choice of data structure in helper functions? We could add e.g. new_styler_df() and styler_df() that use vctrs::new_data_frame() and vctrs::data_frame() under the hood. This means that we could later change the underlying data structure with lesser effort.

Good idea. Done!

Do we still need to import tibble?

Only for tibble::tribble(), which we use in a few places. But, if we wish to get rid of {tibble} from imports, removing this function's usage should be easy to do. Should I do that?

krlmlr · 2022-09-27T05:56:23Z

Let's keep the tribble() call for now.

github-actions · 2022-09-27T05:58:09Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if a0d6c20 is merged into main:

❗🐌cache_applying: 26.8ms -> 30.8ms [+13.87%, +15.99%]
:rocket:cache_recording: 1.25s -> 824ms [-34.47%, -33.66%]
:rocket:without_cache: 3.32s -> 2.08s [-37.6%, -36.8%]

Further explanation regarding interpretation and methodology can be found in the documentation.

github-actions · 2022-09-27T06:54:56Z

This is how benchmark results would change (along with a 95% confidence interval in relative change) if fa98f9c is merged into main:

❗🐌cache_applying: 33.8ms -> 38ms [+8.2%, +16.44%]
:rocket:cache_recording: 1.92s -> 1.19s [-41.8%, -34.25%]
:rocket:without_cache: 4.31s -> 2.58s [-40.78%, -39.48%]

Further explanation regarding interpretation and methodology can be found in the documentation.

lorenzwalthert · 2022-09-27T07:18:05Z

Also, it seems removing {tibble} does not reduce recursive dependencies:

library(magrittr)
deps <- desc::desc_get_deps() %>%
  dplyr::filter(type == 'Imports') %>%
  dplyr::pull(package)

recursive_deps_before <- purrr::map(deps, ~names(renv:::renv_package_dependencies(.x))) %>%
  unlist() %>%
  unique()


deps_without_tibble <- setdiff(deps, 'tibble')

recursive_deps_after <- purrr::map(deps_without_tibble, ~names(renv:::renv_package_dependencies(.x))) %>%
  unlist() %>%
  unique()


waldo::compare(recursive_deps_before, recursive_deps_after)
#> ✔ No differences

^{Created on 2022-09-27 by the reprex package (v2.0.1)}

This is because we use {rematch2} (in one place only, can be worked around probably some how), which in turn depends on {tibble}. That {tibble} dependency was suggested to be removed in r-lib/rematch2#14, where @krlmlr was not all in for the suggested implementation. With the additional development that happened over the last 2 years and more recursive dependencies added to tibble, I think it would be even more beneficial to remove that dependency.

lorenzwalthert

Great job.

IndrajeetPatil · 2022-09-27T07:45:37Z

I think this is a big enough improvement to consider creating a new CRAN release?

* Get rid of unnecessary `.name_repair` arg This is generating warnings. Follow-up on #1007 * make the wrapper even thinner

IndrajeetPatil · 2022-10-11T04:36:09Z

I think this is a big enough improvement to consider creating a new CRAN release?

Any thoughts, @lorenzwalthert and @krlmlr?

We also need to get rid of NOTEs in checks: https://cran.r-project.org/web/checks/check_results_styler.html
Let's not wait to get an email about this 😬

lorenzwalthert · 2022-10-11T07:25:26Z

Yes, I agree. Do you want to m make a PR to main similar to #930, plus using fledge? If not, I can do it, but not this week. Once all checks green, I can submit it.

lorenzwalthert · 2022-10-11T07:26:01Z

I already bumped the version recently and tried to organise the news items a bit.

IndrajeetPatil · 2022-10-12T16:01:52Z

If not, I can do it, but not this week. Once all checks green, I can submit it.

@lorenzwalthert I can wait! :)

IndrajeetPatil added 2 commits September 26, 2022 13:04

as_tibble -> as.data.frame

91b7086

new_tibble -> data.frame

916a421

github-actions Bot and others added 2 commits September 26, 2022 11:14

pre-commit

922ce62

Update utils.R

63ad83b

Update ui-caching.R

17a66db

IndrajeetPatil requested a review from lorenzwalthert September 26, 2022 16:19

krlmlr reviewed Sep 26, 2022

View reviewed changes

krlmlr mentioned this pull request Sep 27, 2022

New C callables to support tibble r-lib/vctrs#1679

Open

10 tasks

encapsulate in wrappers around vctrs functions

e81acb6

IndrajeetPatil changed the title ~~Check for performance improvements with data.frame~~ Replace tibbles with data frames to improve performance Sep 27, 2022

IndrajeetPatil and others added 2 commits September 27, 2022 07:40

Add vctrs to DESCRIPTION

035de78

pre-commit

f0de7b6

IndrajeetPatil added 2 commits September 27, 2022 07:58

Update utils.R

1d40618

Update compat-dplyr.R

6313b71

IndrajeetPatil requested a review from krlmlr September 27, 2022 06:09

lorenzwalthert reviewed Sep 27, 2022

View reviewed changes

Comment thread R/token-define.R

lorenzwalthert reviewed Sep 27, 2022

View reviewed changes

Comment thread R/utils.R

IndrajeetPatil added 2 commits September 27, 2022 08:25

Update detect-alignment.Rmd

60ff313

Don't import entire tibble package

94e30f8

IndrajeetPatil requested a review from lorenzwalthert September 27, 2022 07:10

lorenzwalthert approved these changes Sep 27, 2022

View reviewed changes

IndrajeetPatil merged commit 1a8bab3 into r-lib:main Sep 27, 2022

IndrajeetPatil deleted the perf_dataframe branch September 27, 2022 07:23

IndrajeetPatil mentioned this pull request Sep 27, 2022

Simplify styler_df() signature #1009

Merged

krlmlr mentioned this pull request Sep 28, 2022

Remove rematch2 and tibble dependencies #1010

Closed

IndrajeetPatil added a commit that referenced this pull request Sep 28, 2022

Simplify styler_df() signature (#1009)

35519b9

* Get rid of unnecessary `.name_repair` arg This is generating warnings. Follow-up on #1007 * make the wrapper even thinner

Conversation

IndrajeetPatil commented Sep 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Sep 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Sep 26, 2022

Uh oh!

IndrajeetPatil commented Sep 26, 2022

Uh oh!

lorenzwalthert commented Sep 26, 2022

Uh oh!

MichaelChirico commented Sep 26, 2022

Uh oh!

krlmlr left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Sep 26, 2022

Uh oh!

IndrajeetPatil commented Sep 27, 2022

Uh oh!

krlmlr commented Sep 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Sep 27, 2022

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Sep 27, 2022

Uh oh!

lorenzwalthert commented Sep 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorenzwalthert left a comment

Choose a reason for hiding this comment

Uh oh!

IndrajeetPatil commented Sep 27, 2022

Uh oh!

IndrajeetPatil commented Oct 11, 2022

Uh oh!

lorenzwalthert commented Oct 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorenzwalthert commented Oct 11, 2022

Uh oh!

IndrajeetPatil commented Oct 12, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

IndrajeetPatil commented Sep 26, 2022 •

edited

Loading

codecov-commenter commented Sep 26, 2022 •

edited

Loading

krlmlr commented Sep 27, 2022 •

edited

Loading

lorenzwalthert commented Sep 27, 2022 •

edited

Loading

lorenzwalthert commented Oct 11, 2022 •

edited

Loading