close
Skip to content

gh-95004: specialize access to enums and fix scaling on free-threading#148184

Open
kumaraditya303 wants to merge 7 commits intopython:mainfrom
kumaraditya303:enums
Open

gh-95004: specialize access to enums and fix scaling on free-threading#148184
kumaraditya303 wants to merge 7 commits intopython:mainfrom
kumaraditya303:enums

Conversation

@kumaraditya303
Copy link
Copy Markdown
Contributor

@kumaraditya303 kumaraditya303 commented Apr 6, 2026

Copy link
Copy Markdown
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this, I have just 3 comments.

Copy link
Copy Markdown
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty close, just one question and one minor nit.

case MUTABLE:
// special case for enums which has Py_TYPE(descr) == cls
// so guarding on type version is sufficient
if (Py_TYPE(descr) != cls) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking, Py_TYPE(descr) cannot change right? Like you can't change the __class__ of it later? Or does it not matter because we _GUARD_TYPE_VERSION so it's protected by version check?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Py_TYPE(descr) is protected by the version check of _GUARD_TYPE_VERSION

Copy link
Copy Markdown
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried this on a 96 core AWS machine and the scaling is not great: ~2.9x faster.

  • Given that, I'm pretty ambivalent about the chnage
  • I don't like to add benchmarks to Tools/ftscalingbench/ftscalingbench.py that don't scale well

@kumaraditya303
Copy link
Copy Markdown
Contributor Author

I just tried this on a 96 core AWS machine and the scaling is not great: ~2.9x faster.

On my mac I see 4.2x faster:

./python.exe Tools/ftscalingbench/ftscalingbench.py
Running benchmarks with 10 threads
object_cfunction           3.2x faster
cmodule_function           2.8x faster
object_lookup_special      4.3x faster
context_manager            4.9x faster
mult_constant              2.3x faster
generator                  2.3x faster
pymethod                   3.0x faster
pyfunction                 2.7x faster
module_function            2.7x faster
load_string_const          4.1x faster
load_tuple_const           3.6x faster
create_pyobject            4.4x faster
create_closure             4.4x faster
create_dict                3.8x faster
create_frozendict          4.0x faster
thread_local_read          3.5x faster
method_caller              3.3x faster
instantiate_dataclass      4.9x faster
instantiate_namedtuple     4.9x faster
instantiate_typing_namedtuple  4.9x faster
super_call                 4.6x faster
classmethod_call           3.8x faster
staticmethod_call          3.5x faster
deepcopy                   1.8x slower
setattr_non_interned       4.5x faster
enum_attr                  4.2x faster

I don't like to add benchmarks to Tools/ftscalingbench/ftscalingbench.py that don't scale well

I can remove the benchmark if you prefer.

@Fidget-Spinner
Copy link
Copy Markdown
Member

Fidget-Spinner commented Apr 6, 2026

Could it be that the Enum itself is not using deferred refcounting? It's a LOAD_GLOBAL_MODULE which increfs and then decrefs at each LOAD_ATTR.

@Fidget-Spinner
Copy link
Copy Markdown
Member

I found the problem:

It seems that this perpetually deopts at the first _GUARD_TYPE_VERSION. Then, that causes a re-specialization, which is obviously bottlenecked on a lot of things. So it seems the current specialization/deopt needs to be fixed.

Investigating now.

@Fidget-Spinner
Copy link
Copy Markdown
Member

Fidget-Spinner commented Apr 6, 2026

Before:

taskset -c 0,2,4,6,8,10 ./python Tools/ftscalingbench/ftscalingbench.py enum_attr -t 6 --scale 10000
Running benchmarks with 6 threads
enum_attr                  4.2x faster

After:

Running benchmarks with 6 threads
enum_attr                  6.0x faster

Seems the pre-existing specialization for METACLASS_CHECK was bugged. Diff to fix this:

diff --git a/Python/specialize.c b/Python/specialize.c
index 355b6eabdb7..bfa7b8148e4 100644
--- a/Python/specialize.c
+++ b/Python/specialize.c
@@ -1220,13 +1220,14 @@ specialize_class_load_attr(PyObject *owner, _Py_CODEUNIT *instr,
 #ifdef Py_GIL_DISABLED
             maybe_enable_deferred_ref_count(descr);
 #endif
-            write_u32(cache->type_version, tp_version);
             write_ptr(cache->descr, descr);
             if (metaclass_check) {
-                write_u32(cache->keys_version, meta_version);
+                write_u32(cache->keys_version, tp_version);
+                write_u32(cache->type_version, meta_version);
                 specialize(instr, LOAD_ATTR_CLASS_WITH_METACLASS_CHECK);
             }
             else {
+                write_u32(cache->type_version, tp_version);
                 specialize(instr, LOAD_ATTR_CLASS);
             }
             Py_XDECREF(descr);

This is bugged in 3.14 as well it seems 5d3201f

@Fidget-Spinner
Copy link
Copy Markdown
Member

@colesbury can you please try this again? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants