Replace tibbles with data frames to improve performance#1007
Replace tibbles with data frames to improve performance#1007IndrajeetPatil merged 12 commits intor-lib:mainfrom IndrajeetPatil:perf_dataframe
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1007 +/- ##
=======================================
Coverage 91.14% 91.14%
=======================================
Files 46 46
Lines 2664 2665 +1
=======================================
+ Hits 2428 2429 +1
Misses 236 236
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
|
This is how benchmark results would change (along with a 95% confidence interval in relative change) if 7c691c9 is merged into main:
Further explanation regarding interpretation and methodology can be found in the documentation. |
|
@lorenzwalthert, @krlmlr That's quite the bump in performance when switching to data frames instead of tibbles as our data structure of choice! 😮 |
|
Wow yes @IndrajeetPatil and without much code change even! Well done. Now only hurdle is to make it pass on old releases... |
|
wow, quite impressive speed improvement per LoC change!! |
krlmlr
left a comment
There was a problem hiding this comment.
Nice speedup! Can we encapsulate the choice of data structure in helper functions? We could add e.g. new_styler_df() and styler_df() that use vctrs::new_data_frame() and vctrs::data_frame() under the hood. This means that we could later change the underlying data structure with lesser effort.
Do we still need to import tibble?
|
This is how benchmark results would change (along with a 95% confidence interval in relative change) if 63a27d0 is merged into main:
Further explanation regarding interpretation and methodology can be found in the documentation. |
Good idea. Done!
Only for |
|
Let's keep the |
|
This is how benchmark results would change (along with a 95% confidence interval in relative change) if a0d6c20 is merged into main:
Further explanation regarding interpretation and methodology can be found in the documentation. |
|
This is how benchmark results would change (along with a 95% confidence interval in relative change) if fa98f9c is merged into main:
Further explanation regarding interpretation and methodology can be found in the documentation. |
|
Also, it seems removing {tibble} does not reduce recursive dependencies: library(magrittr)
deps <- desc::desc_get_deps() %>%
dplyr::filter(type == 'Imports') %>%
dplyr::pull(package)
recursive_deps_before <- purrr::map(deps, ~names(renv:::renv_package_dependencies(.x))) %>%
unlist() %>%
unique()
deps_without_tibble <- setdiff(deps, 'tibble')
recursive_deps_after <- purrr::map(deps_without_tibble, ~names(renv:::renv_package_dependencies(.x))) %>%
unlist() %>%
unique()
waldo::compare(recursive_deps_before, recursive_deps_after)
#> ✔ No differencesCreated on 2022-09-27 by the reprex package (v2.0.1) This is because we use {rematch2} (in one place only, can be worked around probably some how), which in turn depends on {tibble}. That {tibble} dependency was suggested to be removed in r-lib/rematch2#14, where @krlmlr was not all in for the suggested implementation. With the additional development that happened over the last 2 years and more recursive dependencies added to tibble, I think it would be even more beneficial to remove that dependency. |
|
I think this is a big enough improvement to consider creating a new CRAN release? |
* Get rid of unnecessary `.name_repair` arg This is generating warnings. Follow-up on #1007 * make the wrapper even thinner
Any thoughts, @lorenzwalthert and @krlmlr? We also need to get rid of |
|
Yes, I agree. Do you want to m make a PR to |
|
I already bumped the version recently and tried to organise the news items a bit. |
@lorenzwalthert I can wait! :) |
#759 (comment)
Need to use continuous benchmarking, so not converting this to a draft.