RunInference: ML Inference in Beam

Rose Liu

Feb 11, 2022

Do you mind sharing the targeting use case of the added implementation? For example, do we expect TFX users who train Pytorch/scikit-learn model to use it someday?
@jyzhao@google.com

Kerry Donny-Clark

Feb 11, 2022

The use case is to allow users that are developing non-TFX models to use those models for inference in Dataflow

J Z

Feb 11, 2022

Curious, Do they share the same RunInference interfaces with Tensorflow? If so, it would be good that we can embed them into TFX-BSL/BulkInfer Component. TFX is not Tensorflow only, it support sklearn, jax etc.

J Z

Feb 11, 2022

go/run-inference-beam plans to extra the bulk infer part from TFX-BSL to Beam, are we able to converge the interface thus we can use single BulkInfer component (with maybe slightly different configs) to handle the use cases.

Kerry Donny-Clark

Feb 11, 2022

Beam users may want to run outside of tensorflow. We need to allow users to upload the model of their choice, and shouldn't force tight coupling with Tensorflow

Kerry Donny-Clark

Feb 11, 2022

It is good to know that running other models through TFX is an option. Where can we find documentation for that?

Jiayi Zhao

Feb 11, 2022

Yep it shouldn't tight coupling with Tensorflow, but since it's all batch infer, I wonder if the interface(input/output format) can be aligned.

For tfx we currently have xgboost/sklearn/jax examples, our current bulkinfer supports Tensorflow/Jax and cloud bulkinferr but not sklearn/xgboost, thus I wonder if the interface aligned, we can add then to TFX BulkInfer component too.

[1]https://github.com/tensorflow/tfx-addons/tree/main/examples
[2]https://github.com/tensorflow/tfx/blob/master/tfx/examples/penguin/penguin_utils_flax_experimental.py
[3]go/tfx-oss-framework-agnostic

Ryan Thompson

Feb 11, 2022

The current design calls for input/output format to not be aligned (TF protos) but rather the same as what users of the framework are familiar with. This is done with the purpose of easing usage for developers that are mainly familiar with their framework.

J Z

Feb 11, 2022

Just wonder if we can utilize TFXIO/TFX-bsl(or similar idea) thus we can have different format but unified memory representation and processing

@iindyk@google.com

Ihor Indyk

Feb 12, 2022

TFXIO operates on pyarrow.RecordBatches that represent batches of data columns that have the same length. If the intent is to do batched inference (eg for efficiency reasons) then it may also make sense to use this format as intermediate representation (and tfx_bsl has a number of utils to convert to/from this from/to different formats, including tensors and ndarrays->RB, pyarrow also supports pandas conversions).

Ryan Thompson

Feb 15, 2022

I also wanted to say, thanks for the links. I think the relevant file that is interesting here is:

https://github.com/tensorflow/tfx-addons/blob/main/examples/sklearn_penguins/sklearn_predict_extractor.py

This file contains an implementation essentially of what we are going to simplify for our users.

Jiayi Zhao

Feb 15, 2022

iiuc, tfma utilized the tfxio memory arrow format, thus the predict_extractor part has the same interface for tensorflow, scikit-learn, xgboost

cc/ @embr@google.com

11 replies

Reza Rokni

Feb 10, 2022

There is two pieces to this:

1- Users who need the whole of the input data
2- Users who need a reference to the input data only

The distinction between these two user requirements is significant:

A Input SIgnature of KV ( also supported by TFX Inference ) allowed the user to provide the reference as the K which can then be used post processing with the <K,V> becoming <K,Predict<V>> on the output. For users in category 2 this means we do not need to replicate the input data into the output ( making this optional would be ideal).

Concrete example:

I have a PCollection of images as KV K=imageuuid V=bytes()

Having the input bytes as output will only be useful in a subset of cases, often its the image_uuid and the prediction that is needed in the resulting PCollection.

Andy Ye

Feb 11, 2022

Yup, we can do something like what's done currently https://github.com/tensorflow/tfx-bsl/blob/master/tfx_bsl/beam/run_inference.py#L354.

Andy Ye

Feb 11, 2022

Thanks Reza. I've modified the wording. Please see the Note below about the TFX implementation. Their output type contains information about the input data.

Reza Rokni

Feb 11, 2022

Thanx!
Yes in their impl input is always included, but they also have KV support as well this is there to support metadata to be passed in without using Tensorflow Signature tricks for metadata passthrough. With Tensorflow signatures you can have some data bypass the input_0 layer of the model and still be available in the output. But by having the KV option it means users dont need to mess with the Signatures for this.

https://www.tensorflow.org/tfx/serving/signature_defs

For our impl we will need KV for the same reason, however I think an extra nice feature for us would be to allow the user to omit the input from the output to reduce bytes processed in the resulting PCollection.

3 replies

Ahmet Altay

Feb 11, 2022

How is it done in the current API?

Andy Ye

Feb 11, 2022

process() returns maps the key of the example to a prediction log proto. i.e. Iterable[Tuple[_K, prediction_log_pb2.PredictionLog]].

Andy Ye

Feb 11, 2022

https://github.com/tensorflow/tfx-bsl/blob/master/tfx_bsl/beam/run_inference.py#L354

Reza Rokni

Feb 11, 2022

+1 to python dataclass. It will be the easier on-ramp for our users. Although I understand that this has draw backs in language portability etc... I believe the trade off is worth it in this case.

Ryan Thompson

Feb 11, 2022

Our assumption is that the language portability will not be an issue since the current beam coders should handle a data class easily.

Ahmet Altay

Feb 12, 2022

dataclasses sounds like a better option. Just to make sure that we considered everything:
- draw backs in language portability -> what are those. Could you explicitly mention here?
- would this result in a split api between existing run inference and the new one?

(Side note: Consider adding an "alternatives considered" section for parts that could go different ways. This, and 1-or-n transforms etc. It would streamline the discussion and agreement process once reviewers can see that different options were considered and they can compare pros and cons easily.)

Brian Hulette

Feb 14, 2022

Note that just because existing Beam coders support a type doesn't mean it's portable. In Python we have a catch-all coder that just serializes data with pickle - this can't be interpreted by non-Python runners or SDKs. I think this would only be a problem when/if we want to expose this as an external transform though.

To workaround we could add support for inferring Beam schemas from dataclasses, then use row coder. We could also do this for protobuf-generated types. And we already have support for inferring schemas from NamedTuple types (perhaps less familiar to python developers than dataclasses though).

Robert Bradshaw

Feb 14, 2022

+1 to supporting schema coder for known dataclasses (regardless of its use here).

7 replies

Robert Bradshaw

Feb 14, 2022

Is this just for PyTorch?

Andy Ye

Feb 15, 2022

For illustrative purposes, we just created PytorchOutputData, but scikit learn should also have one too. Or, as discussed in the other comments, we could just have a unified dataclass that is agnostic of framework

1 reply

Robert Bradshaw

Feb 14, 2022

Are these 1:1. Is batching going to be explicit in the API?

Assigned to

rezarokni@google.com

Andy Ye

Feb 11, 2022

@rezarokni@google.com Should we be replicating how TFX adds serialized examples to the output prediction log?

Reza Rokni

Feb 11, 2022

I would suggest thinking about this problem more generally..

Given n frameworks which may have m output types, do we have a output signature which
- Changes by framework being used,
- Do we have a single output type which we do the conversion.
- Do we have a generic type were input<T> will be output<T>
- Or some other variation

I suspect us having to coerce the output into 'our' thing is too much tech debt and impractical.
Again I suspect that the easiest is to have each framework output its own format ( which for tensorflow will be tf.example) . This option however will need us to think about the RunInference(config) API marked in the future section below. If we want inference to be single transform which is easily composable into a DAG then maybe we need a <bytes> output option.

@robertwb@google.com @bhulette@google.com view of this is I belive for us to output using Schemas. Which falls into the option about us coercing output into a known 'thing'.

But here I would need to defer to your good selves as I maybe thinking about this wrong.. It feels like a very important design decision that needs to be done right so we dont have tech debt in the future.

Robert Bradshaw

Feb 11, 2022

Note that in Java the notion of having a schema doesn't force a concrete representation, just the ability to convert to and from a row object (or, equivalently, the ability to enumerate and extract fields). This hasn't been fleshed out as much in Python though (ess need, due to language differences like the ease of duck typing) but we have given it some thought.

Yifan Mai

Feb 11, 2022

@rongliu@google.com could you review this portion? TLDR is that the TFX RunInference type signature is different from what a "canonical" inference signature would look like.

Reza Rokni

Feb 11, 2022

Good point by Robert, maybe we need a signature per framework and ( once the python bit is fleshed out for <Row>) we add a <Row> output which solves for the single RunInference transform being easily composable into a DAG while still allowing the config to change the framework being used.

@rongliu@google.com I assume this is a solved problem for the TFX runinference as the output is standardized on Predictlog proto.

Rose Liu

Feb 11, 2022

Reza, you're correct that TFX consider the inference output to be PredictionLog for downstream jobs.

Brian Hulette

Feb 15, 2022

Note the schema concept doesn't have support for dynamic schemas - the schema must be known (or at least discoverable) at pipeline construction time. I don't think that's a blocker here though IIUC.

6 replies

Robert Bradshaw

Feb 14, 2022

Can we standardize on a single type?

Yifan Mai

Feb 11, 2022

@iindyk@google.com could you review this portion re: TFX IO and Apache Arrow?

Ihor Indyk

Feb 12, 2022

TFX components already use arrow format on input path and some use it on output path as well

Ihor Indyk

Feb 12, 2022

and we had significant efficiency gains of doing so, one of the reasons is that it's columnar

1 reply

Ihor Indyk

Feb 12, 2022

what are these?

Assigned to

klk@google.com

Andy Ye

Feb 10, 2022

@ryanthompson@google.com Added the future considerations section.

Reza Rokni

Feb 11, 2022

Having a single transform which calls our other transforms is actually a P[0]

Some features that this provides:

If we build a inference Templates, where the configuration dictates the model+framework info and the DAG does not need to be changed at all then when we support new frameworks the template does not change, just the config needed to be sent to it.

All docs / patterns and examples will make use of a single transform, which is changed simply by the configuration we send to it, which is very powerful message.

Any future additions to this framework from outside contributors are much more likely to follow the same example and launch as RunInference()

There is potential for integration into KFP as a inference end point using this transform, just like templates having a single transform that allows config to indicate its impl would simply the work needed there.

Ahmet Altay

Feb 11, 2022

+1 to having a single transform as a P0 and it should be done in the initial work. Changing APIs are very hard. It is better to start where we already know we want to go.

Andy Ye

Feb 11, 2022

We can definitely pursue having a single wrapper around these individual transforms. It was what we had in mind initially, but wondered if developers would feel more comfortable working with framework-specific transforms.

@kerrydc@google.com what are your thoughts here?

Kerry Donny-Clark

Feb 11, 2022

Here are reasons I see for one transform per framework:
1) We will need to support tfx runinference() separately, at least for some time. Legacy customers should not need to rebuild or refactor pipelines using tfx.
2) Different frameworks can take and return different data types, and require different parameters. This is very explicit if you have a transform matched to the framework. It allows users to express themselves using the native types and signatures for the framework, without learning Beam-specific configuration.
3) It is easier to add new frameworks when they are decoupled. A generic interface builds in assumptions, and a framework that violates those assumptions will be difficult to assimilate.

Overall, I see frameworks akin to IOs. They are specific enough that a generic approach cause more issues than it solves.

Ahmet Altay

Feb 12, 2022

These are also very good reasons.

I would suggest getting a second opinion from klk@ / robertwb@, and committing to one option and moving on the execution phase.

Kerry Donny-Clark

Feb 14, 2022

@klk@google.com @robertwb@google.com Please weigh in :)

Reassigned to klk@google.com

Robert Bradshaw

Feb 14, 2022

I see having a single transform as a P0. We may need to expose the TFX one for legacy reasons, but it may be acceptable to just leave that one where it is for those that expect the (exact) legacy behavior.

IMHO, the way it should be used is that every system should support conversion of their own data type to/from schemas (logically, there's not necessarily a need for physical transformation of records). RunInference(TypedModel) would accept any schema'd PCollection, as well as a PCollection of its own specific type, apply the model, and output a schema'd PColletion (again, possibly in its own specific type, but in such a way that it could be easily followed by further transforms in that framework, or any schema-understanding post-processing. All inference is of the form [features] -> [more features], ideally paired with the input features in a standard way.

The configuration and implementation may need to be quite specific, but that's more swappable (e.g. we should be able to swap out o PyTorch model for a TF one by swapping the designation of the model, not how it's used.

Note that even for IOs we are tying to move towards things like SchemaIO and IO pushdown and the ability to treat IOs in a unified way (e.g. you can just swap out one for another) rather than a disparate set of transforms. From what I've seen, the concerns here are even more similar than IOs (and I have yet to see a compelling reason any implementation differences force a difference in API here). This also fits in well with the relational first push. Ideally this cold even open up the possibility of mix and match models, e.g. using an online CloudVision API composed with a PyTorch trained model to act on it.

Kenneth Knowles

Feb 17, 2022

I'm still thinking about this one. I'm generally not automatically for unification of things just because they fit a category or have similarities. If the surrounding code cannot also be swappable, then there's also less benefit.

One example of something that isn't really swappable is SqlTransform with zetasql vs standard sql: there is not that much chance of a context where this swapping matters. But there is some benefit to unification anyhow. SchemaIO is another case where you have a certain generalized API but not for the purposes of swapping but rather the purpose of programmatic inspection/instantiation.

It isn't straightforward to me which way to go in the case of RunInference so I am not ready to weigh in.

8 replies

Ahmet Altay

Feb 11, 2022

Could there be a single RunInference transform that could be parametrized (or even better auto detect??) instead of N different transforms for N different frameworks?

Reza Rokni

Feb 11, 2022

+1

Kerry Donny-Clark

Feb 11, 2022

I would say that a framework is like an IO. We don't have a single very generic IO, with complex parameters that get parsed to make a specific connection to a given source.

Ryan Thompson

Feb 11, 2022

After having looked through the TFX runinference code closely and trying to make generic sections of it, the idea of trying to combine all these frameworks into a single implementation scares me. There is a lot of very specific TFX only sections to that code.

I anticipate a lot of complexity and framework specific code entering into the individual runInference implementations.

Ahmet Altay

Feb 12, 2022

(It looks like I created a second comment thread on the same topic. Let's use the other comment thread as the single place to discuss this issue. Ryan, if you do not mind you can add your comment there as well. Both Reza and Kerry already commented there.)

Robert Bradshaw

Feb 14, 2022

I just realized I started a new thread on this as well. I think there are two considerations here: the API question (in which case there seems to be a clear preference for unification) and a the implementation one (where I think a lot could be shared, but if it contorts things too much/there's a lot of platform specific stuff we don't have to share everything).

5 replies

Valentyn Tymofieiev

Feb 15, 2022

Users could request necessary accelerators with resource hints when they apply the transform https://beam.apache.org/documentation/runtime/resource-hints/#specifying-resource-hints-for-a-transform

Andy Ye

Feb 24, 2022

We think this is worthwhile, but maybe for a future iteration.

1 reply

Rose Liu

Feb 11, 2022

Keeping the same style of inputs with RunInference could make these 2 extensible for future changes. For example, we have a PR to enable streaming model input [1], where a collection of model parameters are consumed.

[1] https://github.com/tensorflow/tfx-bsl/pull/13

Ryan Thompson

Feb 14, 2022

I agree. I think we are looking to find some sort of feature parity between what we can support with newer frameworks to what TFX does and is doing. I know TFX runInference is being very actively developed and we are not as familiar with the roadmap.

Our feeling is that as long as we have an extensible object as the parameters for each framework we should be able to extend features going forward.

1 reply

Robert Bradshaw

Feb 23, 2022

Alternative sketch at https://paste.googleplex.com/4877454263975936

Fleshing a concrete implementation of PyTorch and/or Sklearn would help validate this.

Andy Ye

Feb 24, 2022

Thanks for this sketch! Ryan and I will leverage this to guide our implementation.

FYI: the downside to the composition paradigm here is that we will have to do a bit more refactoring of TFX's logic to support that Tensorflow.
@rezarokni@google.com

Robert Bradshaw

Feb 24, 2022

That is true. Cleanly introducing a new extension point often requires a bit of refactoring, but IMHO that is generally preferable to a sub-optimal structuring of the code for historic reasons only. Fortunately the amount of code in question here is relatively small.

Reza Rokni

Feb 25, 2022

+1 With Roberts comment, if we need a hidden shim layer etc as temporary tech debt, while painful the end goal will be work it.

Minor note ( from reading the sketch):

Can we avoid the word Batch from any external areas of code . This for example has lead to confusion with tfx RunInference where folks have assumed its 'batch' due to naming when digging into the code.

3 replies

Robert Bradshaw

Feb 14, 2022

Prefer composition to inheritance.

Reza Rokni

Feb 10, 2022

Is this @Setup in the DoFn? If yes then might be worth mentioning that there would need to be a check in the process() function ( I think ) to see if the sidinput has changed version to a new model.

Andy Ye

Feb 14, 2022

Good point! Yes, it's in the DoFn. Just added that note.

Robert Bradshaw

Feb 23, 2022

Side inputs are not available form setup(). Above it looks like the model is specified by a url at pipeline construction. Is the intent to be able to specify it in both ways?

2 replies

Robert Bradshaw

Feb 14, 2022

How hard would it be to coerce the data into the right format as needed. It'd be extremely useful to just read schema'd data (e.g. from a CSV, or another ML-agnostic transform) and have it automatically adapt to the framework-specific input (as well as handling intelligent batching as needed). This is the kind of thing that I think will add a lot of value to users.

Robert Bradshaw

Feb 23, 2022

This should be something that is in the generic code, and specific models don't have to worry about.

Andy Ye

Feb 24, 2022

Sounds good. Reworded!

1 reply

Anand Inguva

Feb 10, 2022

Do we pre-process the data somewhere when user passes numpy arrays to RunInference(Pytorch)?

Andy Ye

Feb 10, 2022

Yes, we can have a preprocessing step to check the type. If it's Tensor, then keep it. If it's numpy, then we can do a transform to Tensor.

Andy Ye

Feb 14, 2022

Added a note that for the first iteration, we will assume input is Tensor. In future iterations, we can add logic to do conversions.

2 replies

Valentyn Tymofieiev

Feb 15, 2022

This sounds pipeline-specific. Would users need to implement this logic?

Robert Bradshaw

Feb 23, 2022

+1. I'm not quite following what the role of this function is either.

Andy Ye

Feb 24, 2022

Sorry for this misleading description here. I've updated it. Essentially, TFX does this to their data to make it conform to their protobufs, but other framework implementations may not need this.

Andy Ye

Feb 24, 2022

Marked as resolved

Robert Bradshaw

Feb 24, 2022

Re-opened

Does this have to be done as part of RunInference, or could it be a subsequent DoFn?

Andy Ye

Feb 24, 2022

TFX currently needs a very specific post-process object that matches the type of inference being done (regression vs classification vs multi-inference), so there needs to be this post-process step. But we don't plan to do any framework-level post-processing for Scikit-learn and Pytorch.

Robert Bradshaw

Feb 24, 2022

OK, in that case I would just call it part of the inference.

6 replies

Rose Liu

Feb 15, 2022

Is this relevant? https://github.com/tensorflow/tfx-bsl/pull/14

Fábio Uechi

Jul 7, 2022

Replace: “num_model_versions ** new - Note: This is relevant to models changing via side input.” with “F”

Brian Hulette

Feb 23, 2022

It would be great if we could leverage s.apache.org/batched-dofns here (when it's available in Beam). Note that batching up Beam schema types into common "table" types (pyarrow RecordBatches, pandas DataFrames, etc..) is a primary goal of that effort.

Unfortunately that efforts is going on in parallel with this one, I don't want my work on Batched DoFns to block you, but I do think this could be a valuable use-case for it and I'd be happy to collaborate.

Andy Ye

Feb 23, 2022

Sounds good, looking forward to incorporating it! 👍

1 reply

Anand Inguva

Feb 16, 2022

Making a note here

BatchElements returns a list. We need to convert the list into array/tensor and apparently converting to list to tensor is extremely slow[1]

https://github.com/pytorch/pytorch/issues/13918

Andy Ye

Feb 23, 2022

Thanks for the note!

1 reply

Assigned to

anandinguva@google.com

Andy Ye

Feb 10, 2022

@anandinguva@google.com We started a section on testing here. Feel free to edit, and even link to a dedicated testing doc

Yifan Mai

Feb 11, 2022

Beam or Dataflow benchmarks would be nice. In particular, TFX would need performance parity before migrating to the Beam implementation.

Anand Inguva

Feb 14, 2022

Do we have current TFX benchmarks on Beam or Dataflow? It would be helpful to see how TFX calculated the Benchmarks

2 replies

Ahmet Altay

Feb 11, 2022

What platforms we can integrate with? Does CAIP support scikit and pytorch models?

Andy Ye

Feb 11, 2022

Initial scope is to integrate with Vertex AI. Yes, it supports TF, Pytorch, scikit-learn, XGB.

Ahmet Altay

Feb 12, 2022

Nice!

Ahmet Altay

Feb 12, 2022

(Let's explicitly add that information to the document.)

Andy Ye

Feb 14, 2022

👍

4 replies

Ryan Thompson

Feb 14, 2022

I think iteration #1 doesn't call for support for remote inferences. Vertex AI is supported with the TFX implementation but we don't plan on supporting it for our first pass with sklearn or pytorch.