close
The Wayback Machine - https://web.archive.org/web/20201108004644/https://github.com/graphql-java/java-dataloader/issues/54
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

composing dataloaders #54

Open
sheepdreamofandroids opened this issue Oct 10, 2019 · 10 comments
Open

composing dataloaders #54

sheepdreamofandroids opened this issue Oct 10, 2019 · 10 comments

Comments

@sheepdreamofandroids
Copy link

@sheepdreamofandroids sheepdreamofandroids commented Oct 10, 2019

Hi,

I have a datafetcher where I use 2 dataloaders in sequence: the first to translate from 1 ID to another, the second to fetch data corresponding to the second ID.

loader1.load(id1).thenCompose(id2 -> loader2.load(id2))

This hangs because dispatchAll() is not called again after loader1 completes.
I can work around that by adding that call inside the thenCompose() lambda but then it is called for every id2 which is ugly at the very least.

Is there a better way of doing this?

@bbakerman
Copy link
Member

@bbakerman bbakerman commented Oct 15, 2019

Currently we dont have a way to do what you are after.

CompleteableFutures give us no way to know how deep they are nested and hence how many times dispatch must be called

This is currently and unsolved problem

@sheepdreamofandroids
Copy link
Author

@sheepdreamofandroids sheepdreamofandroids commented Oct 24, 2019

CompleteableFutures might do something else completely so even if you had a way of inspecting their "nestedness", you still wouldn't know that their completion leads to another load(), or even multiple.
I see two possible solutions:

  1. Whenever load() is called, start a timer to call dispatchAll() after 1 ms. If the timer is already running, delay it a bit.
  2. Whenever a batchload completes, complete all the futures and then call dispatchAll().

Both can be made optional using an extra parameter. And of course dispatchAll() should finish quickly when nothing needs to be done.

@vojtapol
Copy link

@vojtapol vojtapol commented Nov 9, 2019

The solution to this problem is to modify loader1 so that nothing needs to be loaded in the .then() block. In a traditional relational database this would mean that loader1 would do some extra joins to obtain all the required data.

@kaqqao
Copy link

@kaqqao kaqqao commented Nov 10, 2019

@vojtapol In all honesty, if you modify the DataLoader to eagerly optimize fetching, you can also modify the original resolver function in the exact same way and drop DataLoader completely.

@sheepdreamofandroids
Copy link
Author

@sheepdreamofandroids sheepdreamofandroids commented Nov 11, 2019

@vojtapol

The solution to this problem is to modify loader1 so that nothing needs to be loaded in the .then() block.

That would solve the immediate problem but there might be other ways to obtain id2. Then that load would have to do a similar join, not taking advantage of the already loaded data2 so that would decrease the effectiveness of the cache and require more memory.

@sheepdreamofandroids
Copy link
Author

@sheepdreamofandroids sheepdreamofandroids commented Nov 11, 2019

I noticed #46 which will solve this transparently.

@hahooy
Copy link

@hahooy hahooy commented Sep 7, 2020

One potential solution is to dispatch all pending data loaders, wait for the futures returned from the dispatched data loaders to complete and then repeat the process to dispatch new pending data loaders that come from the thenCompose chaining. This process can be repeated until all levels of pending data loaders are dispatched. A code example to do this would be something like:

public class Dispatcher {
    private final List<DataLoader<?, ?>> dataLoaders;

    Dispatcher(List<DataLoader<?, ?>> dataLoaders) {
        this.dataLoaders = dataLoaders;
    }

    private int depth() {
        return dataLoaders.stream()
                .mapToInt(DataLoader::dispatchDepth)
                .sum();
    }

    void dispatchAllAndJoin() {
        while (depth() > 0) {
            // Dispatch all data loaders. This will kickoff all batched tasks.
            CompletableFuture<?>[] futures = dataLoaders.stream()
                    .filter(dataLoader -> dataLoader.dispatchDepth() > 0)
                    .map(DataLoader::dispatch)
                    .toArray(CompletableFuture[]::new);
            // Wait for the futures to complete.
            CompletableFuture.allOf(futures).join();
        }
    }
}

In every round of dispatch, Dispatcher#dispatchAllAndJoin will be able to batch all tasks whose dependencies have been resolved by previous dispatches. This logic could potentially live in DataLoaderRegistry but clients can also just implement their own dispatching logic without changing the dataloader library.

@sheepdreamofandroids
Copy link
Author

@sheepdreamofandroids sheepdreamofandroids commented Sep 8, 2020

@hahooy I know, I'm actually using a fully asynchronous version of this: #46 (comment)_

@bbakerman
Copy link
Member

@bbakerman bbakerman commented Sep 8, 2020

For the record just dispatching until the depth is <= 0 will work however it will have the opposite effect. It will cause fields that that COULD be batched together to be eagerly dispatched. So you have "UNDER BATCHING" in this situation.

The real trick is that you need to know WHEN a good time to dispatch is and unlike say javascript there is no "nextTick" time in a JVM.

Actually I have done testing on node.js and they can also UNDER BATCH based on next tick firing before fields compete.

@sheepdreamofandroids
Copy link
Author

@sheepdreamofandroids sheepdreamofandroids commented Sep 9, 2020

For the record just dispatching until the depth is <= 0 will work however it will have the opposite effect. It will cause fields that that COULD be batched together to be eagerly dispatched. So you have "UNDER BATCHING" in this situation.

I don't think that is true since the code waits for all dispatchers to terminate before starting a new round. This guarantees that no new calls will be done on the dataloaders. Unless of course multiple dispatchAllAndJoin() loops run in parallel...

Another inefficiency in this method is that a new round has to wait for the slowest dataloader. Some dispatches could have started earlier.

I imagine some heuristics where each dataloader that has received requests waits some time before dispatching. The closer the number of requests is to the maximum batchsize, the earlier the dispatch. The optimal mapping from number of waiting keys to wait time could be hand tuned or "learned" automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.