Why in the past design we decoupled data plane and control plane, there should be a good reason?
This proposal will violate the reason? Does that cause any problem?
Suggestion was deleted
Show more
Show less
Comment details cannot be verified
Yichi Zhang
Nov 3, 2021
Approver
Current workflow works well generally, the proposal is targeting a very specific scenario: small and cheap bundles. It is not feasible to assume that all the data can be transferred in control plane in one grpc message for large bundles due to memory constraint and efficiency concerns.
The internal Dataflow streaming service uses a single proto for input and output data and this leads to bundles that can't be processed because they are too big.
Reply was deleted
Show more
Show less
Comment details cannot be verified
2 replies
New
Luke Cwik
Nov 3, 2021
Approver
Is there a concern that these larger messages over the control plane create bottlenecks for smaller messages like splitting and progress reports?
Any recommendations on maximum amount of data to put in ProcessBundleRequest/ProcessBundleResponse?
Suggestion was deleted
Show more
Show less
Comment details cannot be verified
Rui Wang
Nov 3, 2021
Approver
+1
with some detailed number/macro benchmarks, we could better understand the problem.
Reply was deleted
Show more
Show less
Comment details cannot be verified
Yichi Zhang
Nov 4, 2021
Approver
That's a good question, I think the maximum amount of data can be selected conservatively, for example 1mb(needs some further experimenting) or maybe made configurable. If the larger messages on control plane are hitting grpc control flow and blocking splitting and progress reports then I think the bottleneck of the job would have existed on the data channel data transmission, considering the smaller volume and size of splitting and progress reports RPCs.
Reply was deleted
Show more
Show less
Comment details cannot be verified
2 replies
New
Brian Hulette
Nov 3, 2021
Approver
Could this waiting negatively affect the current workflow?
Suggestion was deleted
Show more
Show less
Comment details cannot be verified
Rui Wang
Nov 3, 2021
Approver
I can see one issue is In the past design SDK will prepare the UDF while runner side prepare the data, this hides latency of the SDK UDF preparation.
The worst case scenario could be SDK side will not able to prepare UDF earlier to get the latency hidden while runner side still need to pay a amount of time to have the data in place (then send both in a request).
Reply was deleted
Show more
Show less
Comment details cannot be verified
Yichi Zhang
Nov 3, 2021
Approver
I think today runners are using data sized buffer for the data plane already (without data buffering we end up sending one grpc message per element). So the wait already existed, the proposal merely shifts the time of runner sending the process bundle request. And yes, this may delay SDK UDF preparation, however normally this preparation is quite cheap by reusing the cached bundle handler and transform runner. Regardless, the wait time should be short.
Reply was deleted
Show more
Show less
Comment details cannot be verified
2 replies
New
Robert Burke
Nov 3, 2021
Approver
Do we have an idea of what that size limit should be? 4MB? 64MB?
Suggestion was deleted
Show more
Show less
Comment details cannot be verified
Yichi Zhang
Nov 3, 2021
Approver
That's yet to be benchmarked, I've experimented conservatively with current data plane flush limit of 1mb.
Reply was deleted
Show more
Show less
Comment details cannot be verified
1 reply
New
reuven lax
Nov 3, 2021
Approver
Why is it all or nothing? If the bundle is too large, can we just split it and include a smaller bundle in the response?
Suggestion was deleted
Show more
Show less
Comment details cannot be verified
Robert Burke
Nov 3, 2021
Approver
Would that add much value vs the "need to now merge these streams" complexity the runner would need to deal with? And the similar complexity for the SDK? Knowing it's all or nothing feels like it would be much easier to debug.
Reply was deleted
Show more
Show less
Comment details cannot be verified
reuven lax
Nov 3, 2021
Approver
yeah, but could lead to falling-off-the cliff syndrome - one extra byte pushes you into the slower path. When possible, graceful decay is preferable.
Reply was deleted
Show more
Show less
Comment details cannot be verified
Luke Cwik
Nov 3, 2021
Approver
Larger bundles spend most of their time outside of passing data between two threads which is the largest reason for this performance improvement.
That being said it would be pretty easy to say that the ProcessBundleRequest contains the "prefix" of the inbound logical streams that would exist on the data channel.
Similarly the ProcessBundleRequest would represent the "suffix" of the outbound logical streams that would exist on the data channel.
Reply was deleted
Show more
Show less
Comment details cannot be verified
3 replies
New
Robert Bradshaw
Nov 3, 2021
Approver
Note that the runner typically starts consuming data from the data plane before the response comes back (e.g. to handle large bundles). This doesn't significantly change the proposal though, assuming it can stop waiting right away given an appropriate ProcessBundleResponse.
Suggestion was deleted
Show more
Show less
Comment details cannot be verified
Luke Cwik
Nov 3, 2021
Approver
Yes, the runner seems to be stuck with the additional co-ordination on the SDK outbound channel.
From a performance perspective if you split these two changes how much did the ProcessBundleRequest improve performance vs the ProcessBundleResponse?
Reply was deleted
Show more
Show less
Comment details cannot be verified
Robert Burke
Nov 3, 2021
Approver
+1 to splitting. The runner can always know whether it needs to send data over the data channel and can avoid doing so when possible but must always be prepared for data from the SDK, which may not happen. It's not clear to me there's enough benefit on the Response side as there is on the Request side.
Reply was deleted
Show more
Show less
Comment details cannot be verified
Robert Bradshaw
Nov 3, 2021
Approver
I think it needs to be split (e.g. the thread about capabilities below), but there should be similar benefits on each side (2 RPCs to 1, and possibly less flushing of a multiplexed data channel).
Reply was deleted
Show more
Show less
Comment details cannot be verified
Yichi Zhang
Nov 3, 2021
Approver
yet, I think it should be possible that the runner wait on the ProcessBundleResponse and just go ahead and mark the data channel as completed and avoid waiting for the terminal last data. Thus we can avoid the additional coordination on the SDK outbound channel. I haven't really tested the performance improvement on the output path though.
Reply was deleted
Show more
Show less
Comment details cannot be verified
4 replies
New
Robert Bradshaw
Nov 3, 2021
Approver
Do we also need a runner capability to allow the SDK to send data back to the runner in the process bundle response?
Suggestion was deleted
Show more
Show less
Comment details cannot be verified
Luke Cwik
Nov 3, 2021
Approver
A runner capability seems necessary.
Reply was deleted
Show more
Show less
Comment details cannot be verified
Yichi Zhang
Nov 3, 2021
Approver
Good point.
Reply was deleted
Show more
Show less
Comment details cannot be verified
2 replies
New
You're suggesting
Gemini created these notes. They can contain errors so should be double-checked. How Gemini takes notes
Drag image to reposition
12
11
10
9
8
7
6
5
4
3
2
1
1
2
3
4
5
6
7
8
9
10
11
Outline
Outline
Document tabs
[Proposal] Optional elements embedding in Fn API process bundle request and response
7
Headings you add to the document will appear here.
Optional elements embedding in Fn API process bundle request and response
Overview
Current inefficiency in tiny and cheap bundles
Proposal
Compatibility
Changes by
[Proposal] Optional elements embedding in Fn API process bundle request and response