Jan
26
1 day ago
Activity icon
issue

dmitra79 issue comment tensorflow/tfx

dmitra79
dmitra79

Transforming date values

Hi.

I have a use case where I want to use date features as input values for a predictive model. I need to transform the date features to be useful. For example, I need to know the difference between two dates (for example, just the difference in days between 01-04-2019 and 16-04-2019, but the dates can also be months or years apart). Or just getting the day of the month, the month itself or the year (i.e. for 16-04-2019, getting 16, 4 and 2019 as seperate values).

My question is if it is possible to do this within TFX and if not, is this a feature that is coming up? It would be important for my use case because the transform needs to be done in the graph format so that I can serve the model with the transformations inside the pipeline. Otherwise I would need to add something that can do this for me outside of TFX.

Thanks in advance!

Martijn

dmitra79
dmitra79

Any update on these features? They really would be great to have!

pull request

copybara-service[bot] pull request tensorflow/tfx

copybara-service[bot]
copybara-service[bot]

Fixed the cluster spec error in CAIP Tuner on Vertex when `num_parallel_trials = 1`

Fixed the cluster spec error in CAIP Tuner on Vertex when num_parallel_trials = 1

Activity icon
created branch

copybara-service[bot] in tensorflow/tfx create branch test_424228346

createdAt 4 hours ago
push

copybara-service[bot] push tensorflow/tfx

copybara-service[bot]
copybara-service[bot]

Input resolution can return length > 1.

Previously for type hinting convenience processor.run_resolver_steps only returned a sintle dict, thus inputs_utils.resolve_input_artifacts_v2 could only effectively returns a list of dict of size 1.

This change takes the case where resolver operator directly returning a list of dict into account and made appropriate type hinting change and type validation.

PiperOrigin-RevId: 424227282

copybara-service[bot]
copybara-service[bot]

Optionally reuse artifacts that do not affect consistent execution of downstream nodes in partial run.

PiperOrigin-RevId: 424263054

copybara-service[bot]
copybara-service[bot]

support dynamic exec properties in TFX pipelines[part1]: DSL

The CL is the DSL part to allow node parameter in TFX pipeline to take output from upstream node. Detailed steps include: One dynamic exec property adds one implicit input channel -- the implicit input channel takes the upstream node output.

The example to use dynamic exec property:

class DownStreamSpec(types.ComponentSpec):
  PARAMETERS = {
      'input_num':
          component_spec.ExecutionParameter(type=int)
  }
  INPUTS = {}
  OUTPUTS = {}

class Executor(base_executor.BaseExecutor):
  def Do(self, input_dict: Dict[str, List[types.Artifact]],
         output_dict: Dict[str, List[types.Artifact]],
         exec_properties: Dict[str, Any]) -> None:
    executed_components.append('DownstreamComponent')

class DownstreamComponent(base_component.BaseComponent):
  SPEC_CLASS = DownStreamSpec
  EXECUTOR_SPEC = executor_spec.ExecutorClassSpec(Executor)

  def __init__(self,
               input_num: Optional[Union[int, ph.Placeholder]] = None):
    spec = DownStreamSpec(input_num=input_num)
    super().__init__(spec=spec)

downstream_component = DownstreamComponent(
      input_num=upstream_component.outputs['num'].future()[0].value)

PiperOrigin-RevId: 410287732

commit sha: 8410c8699a75a3cbb13d0edebd99fe66f8156651

push time in 6 hours ago
Activity icon
issue

davidxia issue comment tensorflow/tfx

davidxia
davidxia

Exception telling users to install full TFX pkg is misleading in some cases

System information

  • Have I specified the code to reproduce the issue (Yes, No): Yes, see below
  • Environment in which the code is executed (e.g., Local(Linux/MacOS/Windows), Interactive Notebook, Google Cloud, etc): Local macOS 11.6
  • TensorFlow version: 2.7.0
  • TFX Version: 1.5.0
  • Python version: 3.7.10
  • Python dependencies (from pip freeze output):

See below. However, these came solely from the singular dependency tfx==1.5.0

absl-py==0.12.0
apache-beam==2.35.0
appnope==0.1.2
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
astunparse==1.6.3
attrs==20.3.0
backcall==0.2.0
bleach==4.1.0
cached-property==1.5.2
cachetools==4.2.4
certifi==2021.10.8
cffi==1.15.0
charset-normalizer==2.0.10
clang==5.0
click==7.1.2
crcmod==1.7
debugpy==1.5.1
decorator==5.1.1
defusedxml==0.7.1
dill==0.3.1.1
docker==4.4.4
docopt==0.6.2
entrypoints==0.3
fastavro==1.4.9
fasteners==0.16.3
flatbuffers==1.12
gast==0.4.0
google-api-core==1.31.5
google-api-python-client==1.12.10
google-apitools==0.5.31
google-auth==1.35.0
google-auth-httplib2==0.1.0
google-auth-oauthlib==0.4.6
google-cloud-aiplatform==1.9.0
google-cloud-bigquery==2.32.0
google-cloud-bigquery-storage==2.11.0
google-cloud-bigtable==1.7.0
google-cloud-core==1.7.2
google-cloud-datastore==1.15.3
google-cloud-dlp==3.4.0
google-cloud-language==1.3.0
google-cloud-pubsub==1.7.0
google-cloud-recommendations-ai==0.2.0
google-cloud-spanner==1.19.1
google-cloud-storage==1.44.0
google-cloud-videointelligence==1.16.1
google-cloud-vision==1.0.0
google-crc32c==1.3.0
google-pasta==0.2.0
google-resumable-media==2.1.0
googleapis-common-protos==1.54.0
grpc-google-iam-v1==0.12.3
grpcio==1.43.0
grpcio-gcp==0.2.2
h5py==3.1.0
hdfs==2.6.0
httplib2==0.19.1
idna==3.3
importlib-metadata==4.10.0
importlib-resources==5.4.0
ipykernel==6.7.0
ipython==7.31.0
ipython-genutils==0.2.0
ipywidgets==7.6.5
jedi==0.18.1
Jinja2==3.0.3
joblib==0.14.1
jsonschema==4.4.0
jupyter-client==7.1.0
jupyter-core==4.9.1
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.2
keras==2.7.0
Keras-Preprocessing==1.1.2
kt-legacy==1.0.4
kubernetes==12.0.1
libclang==12.0.0
libcst==0.4.0
Markdown==3.3.6
MarkupSafe==2.0.1
matplotlib-inline==0.1.3
mistune==0.8.4
ml-metadata==1.5.0
ml-pipelines-sdk==1.5.0
mypy-extensions==0.4.3
nbclient==0.5.10
nbconvert==6.4.0
nbformat==5.1.3
nest-asyncio==1.5.4
notebook==6.4.7
numpy==1.19.5
oauth2client==4.1.3
oauthlib==3.1.1
opt-einsum==3.3.0
orjson==3.6.5
packaging==20.9
pandas==1.3.5
pandocfilters==1.5.0
parso==0.8.3
pexpect==4.8.0
pickleshare==0.7.5
portpicker==1.5.0
prometheus-client==0.12.0
prompt-toolkit==3.0.24
proto-plus==1.19.8
protobuf==3.19.3
psutil==5.9.0
ptyprocess==0.7.0
pyarrow==2.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.21
pydot==1.4.2
Pygments==2.11.2
pymongo==3.12.3
pyparsing==2.4.7
pyrsistent==0.18.0
python-dateutil==2.8.2
pytz==2021.3
PyYAML==5.4.1
pyzmq==22.3.0
requests==2.27.1
requests-oauthlib==1.3.0
rsa==4.8
scipy==1.7.3
Send2Trash==1.8.0
six==1.15.0
tensorboard==2.7.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.7.0
tensorflow-data-validation==1.5.0
tensorflow-estimator==2.7.0
tensorflow-hub==0.12.0
tensorflow-io-gcs-filesystem==0.23.1
tensorflow-metadata==1.5.0
tensorflow-model-analysis==0.36.0
tensorflow-serving-api==2.7.0
tensorflow-transform==1.5.0
termcolor==1.1.0
terminado==0.12.1
testpath==0.5.0
tfx==1.5.0
tfx-bsl==1.5.0
tornado==6.1
traitlets==5.1.1
typing-extensions==3.7.4.3
typing-inspect==0.7.1
uritemplate==3.0.1
urllib3==1.26.8
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==1.2.3
Werkzeug==2.0.2
widgetsnbextension==3.5.2
wrapt==1.12.1
zipp==3.7.0

Describe the current behavior

With the introduction of the lightweight ml-pipelines-sdk package came safeguards warning users of when they need to install the "full" TFX dependency.

The message is misleading and obfuscates potentially orthogonal underlying issues. For instance, the absence of keras_tuner -- a TFX dependency -- results in the red-herring message recommending users install the full tfx package, even when that is already the case.

Describe the expected behavior

The exception shows the true root cause in cases where the underlying issue is not the absence of the full tfx dependency.

Standalone code to reproduce the issue

To reproduce:

  • pip install -U tfx==1.5.0
  • pip uninstall keras-tuner
  • In Python:
>>> import keras_tuner # ensure that keras_tuner does not exist
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'keras_tuner'
>>> import tfx.components # expose the true issue
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jjin/.pyenv/versions/tfx-module-not-found/lib/python3.7/site-packages/tfx/components/__init__.py", line 30, in <module>
    from tfx.components.tuner.component import Tuner
  File "/Users/jjin/.pyenv/versions/tfx-module-not-found/lib/python3.7/site-packages/tfx/components/tuner/component.py", line 18, in <module>
    from keras_tuner.engine import base_tuner
ModuleNotFoundError: No module named 'keras_tuner'
>>> from tfx.types.standard_artifacts import Example
>>> Examples() # note that exception is a red herring
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jjin/.pyenv/versions/tfx-module-not-found/lib/python3.7/site-packages/tfx/types/standard_artifacts.py", line 74, in __init__
    raise Exception('The full "tfx" package must be installed to use this '
Exception: The full "tfx" package must be installed to use this functionality.

Name of your Organization (Optional)

Spotify

Activity icon
issue

tanguycdls issue comment tensorflow/tfx

tanguycdls
tanguycdls

Request cpu / memory for containers in Kubeflow

I am curious if folks are thinking about allowing users to set the cpu / memory requirements for containers in Kubeflow on TFX components?

We can get around needing larger machines by using ai-platform or Dataflow, but our custom components don't really fit into that paradigm. I'm curious if the inability to set compute requirements means TFX can't take advantage of ai-platform pipeline's ability to autoscale nodes if we kick off a ton of pipelines at once.

Anyways, please let me know if this is possible and I missed it! Maybe it is possible with the upcoming V2 runner.

tanguycdls
tanguycdls

Hello after investigation with @axelborja he discovered it works if you manually modify the json and modify the spec:

with open(ORGINAL_PIPELINE_FILE) as json_file:
    data = json.load(json_file)

for component_name, component_executor_spec in data["pipelineSpec"]["deploymentSpec"]["executors"].items():
    if component_name.startswith("the_name_of_the_component_you_wish_to_extend"):
        component_executor_spec["container"]["resources"] = {
              "cpuLimit": 16.0,
              "memoryLimit": 64.0
            }

and then give that modified json to vertex ai pipeline, when started it will use a correct instance that respects those conditions.

Those could have been set here: https://github.com/tensorflow/tfx/blob/7a0a2ce66d4025e643782529c828736c7b8ea770/tfx/orchestration/kubeflow/v2/step_builder.py#L403 we can see the param name here: https://github.com/kubeflow/pipelines/blob/7d5690a21cf8e8c464a6ddba520879bd30fd2ddc/api/v2alpha1/pipeline_spec.proto#L638

if someone from Vertex AI could confirm this workaround is valid ? thanks

Activity icon
issue

tanguycdls issue comment tensorflow/tfx

tanguycdls
tanguycdls

TFX 1.6.0-rc0 Issues

Please comment or link any issues you find with TFX 1.6.0-rc0.

Thanks.

Activity icon
delete

copybara-service[bot] in tensorflow/tfx delete branch test_423370741

deleted time in 17 hours ago
pull request

copybara-service[bot] pull request tensorflow/tfx

copybara-service[bot]
copybara-service[bot]

Optionally reuse artifacts that do not affect the correct execution in partial run.

Optionally reuse artifacts that do not affect the correct execution in partial run.

push

copybara-service[bot] push tensorflow/tfx

copybara-service[bot]
copybara-service[bot]

Optionally reuse artifacts that do not affect consistent execution of downstream nodes in partial run.

PiperOrigin-RevId: 424263054

commit sha: 41666c4e4a0cf02fa313afa0938d16222a654faa

push time in 17 hours ago
Previous