Schema parsing and resolved schema rework#536
Draft
KitFieldhouse wants to merge 3 commits intoapache:mainfrom
Draft
Schema parsing and resolved schema rework#536KitFieldhouse wants to merge 3 commits intoapache:mainfrom
KitFieldhouse wants to merge 3 commits intoapache:mainfrom
Conversation
Member
|
There are many conflicts! |
Member
|
Also it would be good to address #368 if such big changes are going to be made. |
Author
Yeah sorry, my bad, I will get that done.
I'll take a look! |
fb8c6a3 to
23c056a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hey you all, this PR reworks schema parsing and
ResolvedSchema.I apologize for the size of this PR. I tried making it smaller, but I kept finding the changes made to parsing/
ResovledSchemawould trickle into many other sections of the crate.This PR addresses issues #531, #353, and #508.
Here is the gist of what this PR does:
The new stuff
Introduces
SchemaWithSymbols.Instead of representing a schema as a single recursive
Schemavariant,SchemaWithSymbolsrepresents a schema as a hash map of all defined named schemata and a hash set of all referenced named schemata from that schema. This, coupled with the root schema enum, is enough to represent the entire schema parse tree.The upshot of this representation is that when we go to check if a set of schemata form a complete context we don't need to recurse into each schema and can just compare the "symbol tables" directly.
In addition,
SchemaWithSymbolsis essentially just made up ofArc's andHashMapsofArc's. This should make cloning fairly quick compared toSchema.Introduces
ResolvedContext.ResovledContextwraps a list of schema definitions that have been checked for compatibility (no double naming of named schemata, all named references have a definition). This is closest to what the originalResolvedSchemalooked like.Completely reworks
ResolvedSchema.As can be seen by the diff,
ResolvedSchemahas been essentially replaced in its entirety.ResolvedSchemanow wraps a schema and all of the definitions needed to uniquely resolve all named schemata. In addition, we can convertResolvedSchemainto aResolvedNode.Introduces
ResolvedNode.The struct
ResolvedNodeallows other code to walk down a schema tree without having to resolveSchema::Refvariants themselves. Further, since aResolvedNodecan only be created from aResolvedSchema, we can make the type guarantee that this resolution will always find a unique named schema definition.ResolvedNodealso makes the promise that any default for a record field has been checked against the field's schema.Introduces a
Resolvertrait.This trait is designed to be used to hook up this crate's schema resolution logic to an external schema registry. Struct's implementing this trait can be provided to
ResolvedContexteither directly or throughResolvedSchema. If a local definition for a schema can not be found, theResolveris called asking for it to either provide that schema or to signal an error.API changes
This PR introduces several breaking API changes, both in the public API and to the private API. I will try to summarize the changes for both layers.
Semantic changes -- parsing
Parser now always generates a
SchemaWithSymbols. To keep compatibility with existingSchemabased methods, the original schema parsing methods likeSchema::parse_strwill use the parser to generate aSchemaWithSymbolswhich is then unraveled into a normalSchemaenum. This adds some overhead, but this is already a "slow" path so I don't think this introduces performance concerns.Schema::parse_listno longer checks if every named schema has a unique definition somewhere in the list. This is due to the conceptual shift mentioned in issue schema parsing/resolution improvements? #353. In this PR, parser methods do nothing more than parse each schema string and check that each one is internally consistent. Checking that a list of schemata is consistent (no duplicate named definitions, all references resolved) is pushed toResolvedSchema.API changes -- public facing
ResolvedSchemacompletely reworked. Any code that relied on the oldResolvedSchemawould have to be refactored (though I think the needed changes are rather minimal). The newResolvedSchemawraps a single schema and its resolved context, and is created viaResolvedSchema::builder(). See the new builder API:.additional()to provideextra schemata, then
.build_one(),.build_array(), or.build_list()as the terminal method depending on how many schemas you need resolved. Variants with_with_resolveraccept a customResolverimplementation for schema registry lookups.Schema::parse_str_with_list. This method parsed a schema string alongside a list of other schema strings and returned the parsed schema plus the resolved list. The same result can now be achieved by parsing each schema independently withSchemaWithSymbols::parse_strand then resolving them together viaResolvedSchema::builder().additional(others)?.build_one(target)?.Schema::denormalize. Denormalization (inlining named type definitions into references) is now handled viaResolvedSchema::unravel()orSchemaWithSymbols::unravel().Schema::independent_canonical_form. This is now accessed throughCompleteSchema::independent_canonical_form()which requires aResolvedSchema, making the resolved-context dependency explicit.resolve_namesandresolve_names_with_schemata(werepub(crate)) are removed. Their role is subsumed byResolvedContext.ResolvedOwnedSchemais removed.ResolvedSchemanow owns its data (viaArc), making the separate owned variant unnecessary.Schema::name()now returnsOption<&Arc<Name>>instead ofOption<&Name>.Value::validate_schemata(schemata)replaced byValue::validate_against_resolved(resolved_schema).find_schema_with_known_schematais removed. Finding a matching element in a union requires that you have aResolvedSchema. The method that performs union element matching is nowstructural_match_on_schemadefined onResolvedUnion.API changes -- private facing
decode_internalsignature changed from(schema: &Schema, names: &HashMap<Name, S>, enclosing_namespace: NamespaceRef, reader: &mut R)to(node: ResolvedNode, reader: &mut R). TheResolvedNodecarries the schema, its resolved definitions, and handles namespace tracking internallyencode_internalsimilarly changed from(value: &Value, schema: &Schema, names: &HashMap<Name, S>, enclosing_namespace: NamespaceRef, writer: &mut W)to
(value: &Value, node: ResolvedNode, writer: &mut W).decode_completeandencode_completeare added as the preferred entry points for decoding/encoding with aResolvedSchema.Blocknow takes aResolvedSchemainstead ofNames.validate_internalandresolve_internalonValuechanged from generic<S: Borrow<Schema> + Debug>with(schema, names, namespace)parameters to taking aResolvedNodedirectly.S: Borrow<Schema>pattern used throughout decode, encode, and types for referencing named schemas is entirely replaced byResolvedNode's transparent reference resolution.Schema::parse_with_names(waspub(crate)). Internal callers now useSchemaWithSymbolsandResolvedSchemainstead of passingNamesmaps through parsing.