jackd

jackd

Deep Learning / Cybsecurity Researcher

Member Since 10 years ago

University of Melbourne, Melbourne

Experience Points
58
follower
Lessons Completed
4
follow
Lessons Completed
27
stars
Best Reply Awards
113
repos

212 contributions in the last year

Pinned
⚡ Code for paper "Learning Free-Form Deformations for 3D Object Reconstruction"
⚡ Python loading/manipulation functions for ShapeNet object dataset.
⚡ Utility files for human pose estimation in python
⚡ GAN-based 3D human pose estimation model for 3DV'17 paper
⚡ Cython iterative farthest point sampling implementation
⚡ Efficient tensorflow nearest neighbour op
Activity
Nov
23
5 days ago
Activity icon
issue

jackd issue comment tensorflow/datasets

jackd
jackd

'Command not found' when trying to install tfds

Received the error message ‘command not found’ when trying to install tfds on a GCP VM. Have attempted via: conda install -c anaconda tensorflow-dataset, pip install tensorflow-datasets, pip install -q tfds-nightly

none of the above are working.. Any suggestions would be very helpful!

jackd
jackd

My tfds-nightly installed via pip install tfds-nightly doesn't have a tensorflow_datasets/scripts/utils directory, nor if I clone this repo and pip install .. If I clone the repo and pip install -e . it works fine.

Nov
19
1 week ago
Activity icon
issue

jackd issue comment tensorflow/gnn

jackd
jackd

Examples graphs

Please provide examples of training and usage with known academic graph datasets. It would be worth to have the recent published TUDATASET benchmark for graph regression and classification, since it collects more than 100 hundred graph datasets.

Thanks for this library, cannot wait to try it!

jackd
jackd

Seconded. Perhaps an example tensorflow-datasets implementation? I'd be happy to contribute tfds implementations for many common research datasets if standardized FeatureConnector could be provided for the relevant graph types.

Nov
10
2 weeks ago
push

jackd push jackd/grax

jackd
jackd

changed connected_components impl to scipy, removed networkx dep

commit sha: 99baaea786c59c1f5fe4314ba26d04b9a69499d6

push time in 2 weeks ago
Nov
9
2 weeks ago
push

jackd push jackd/grax

jackd
jackd

commit sha: bce107d365468681d55525e0f9d584b83017f98a

push time in 2 weeks ago
push

jackd push jackd/grax

jackd
jackd

added appnp/gcn2/sgc, fixed pigcn data

commit sha: c6fcbbeb08fc1a051404d7aa66f6402095366beb

push time in 2 weeks ago
Nov
7
3 weeks ago
push

jackd push jackd/grax

jackd
jackd

commit sha: c77205a314870bf56ab23295fe0be5efeb745b25

push time in 3 weeks ago
Oct
12
1 month ago
Activity icon
created branch

jackd in jackd/tf-wrn create branch master

createdAt 1 month ago
Activity icon
created repository

jackd in jackd/tf-wrn create repository

createdAt 1 month ago
Oct
7
1 month ago
Activity icon
issue

jackd issue comment tensorflow/models

jackd
jackd

Trying to train ResNet50 from scratch, documentation is not clear

Prerequisites

Please answer the following question for yourself before submitting an issue.

  • I checked to make sure that this issue has not been filed already.

1. The entire URL of the documentation with the issue

https://github.com/tensorflow/models/tree/master/official/vision/image_classification

2. Describe the issue

I can't reproduce the examples provided in the documentation. These are the steps I'm following:

a) sudo docker run --gpus all -it --rm tensorflow/tensorflow:latest-gpu /bin/bash b) python3 -m pip install --upgrade pip c) pip install tf-models-official d) download config files using curl (configs/examples/resnet/imagenet/gpu.yaml and configs/examples/efficientnet/imagenet/efficientnet-b0-gpu.yaml) e) execute the code provided:

python3 classifier_trainer.py \
  --mode=train_and_eval \
  --model_type=resnet \
  --dataset=imagenet \
  --model_dir=$MODEL_DIR \
  --data_dir=$DATA_DIR \
  --config_file=configs/examples/resnet/imagenet/gpu.yaml \
  --params_override='runtime.num_gpus=$NUM_GPUS'

As a result I'm getting:

2021-09-14 19:14:03.666015: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-14 19:14:03.675435: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-14 19:14:03.676603: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I0914 19:14:03.682561 140080684554048 classifier_trainer.py:181] Base params: {'evaluation': {'epochs_between_evals': 1, 'skip_eval': False, 'steps': None},
 'export': {'checkpoint': None, 'destination': None},
 'mode': None,
 'model': {'learning_rate': {'boundaries': [30, 60, 80],
                             'decay_epochs': None,
                             'decay_rate': None,
                             'examples_per_epoch': 1281167,
                             'initial_lr': 0.1,
                             'multipliers': [0.000390625,
                                             3.90625e-05,
                                             3.90625e-06,
                                             3.90625e-07],
                             'name': 'stepwise',
                             'scale_by_batch_size': 0.00390625,
                             'staircase': None,
                             'warmup_epochs': 5},
           'loss': {'label_smoothing': None,
                    'name': 'sparse_categorical_crossentropy'},
           'model_params': {'batch_size': None,
                            'num_classes': 1000,
                            'rescale_inputs': False,
                            'use_l2_regularizer': True},
           'name': 'ResNet',
           'num_classes': 1000,
           'optimizer': {'beta_1': None,
                         'beta_2': None,
                         'decay': 0.9,
                         'epsilon': 0.001,
                         'lookahead': None,
                         'momentum': 0.9,
                         'moving_average_decay': None,
                         'name': 'momentum',
                         'nesterov': None}},
 'model_dir': None,
 'model_name': None,
 'runtime': {'all_reduce_alg': None,
             'batchnorm_spatial_persistent': False,
             'dataset_num_private_threads': None,
             'default_shard_dim': -1,
             'distribution_strategy': 'mirrored',
             'enable_xla': False,
             'gpu_thread_mode': None,
             'loss_scale': None,
             'mixed_precision_dtype': None,
             'num_cores_per_replica': 1,
             'num_gpus': 0,
             'num_packs': 1,
             'per_gpu_thread_count': 0,
             'run_eagerly': False,
             'task_index': -1,
             'tpu': None,
             'tpu_enable_xla_dynamic_padder': None,
             'worker_hosts': None},
 'train': {'callbacks': {'enable_backup_and_restore': False,
                         'enable_checkpoint_and_export': True,
                         'enable_tensorboard': True,
                         'enable_time_history': True},
           'epochs': 90,
           'metrics': ['accuracy', 'top_5'],
           'resume_checkpoint': True,
           'set_epoch_loop': False,
           'steps': None,
           'tensorboard': {'track_lr': True, 'write_model_weights': False},
           'time_history': {'log_steps': 100}},
 'train_dataset': {'augmenter': {'name': None, 'params': None},
                   'batch_size': 128,
                   'builder': 'records',
                   'cache': False,
                   'data_dir': None,
                   'download': False,
                   'dtype': 'float32',
                   'file_shuffle_buffer_size': 1024,
                   'filenames': None,
                   'image_size': 224,
                   'mean_subtract': True,
                   'name': 'imagenet2012',
                   'num_channels': 3,
                   'num_classes': 1000,
                   'num_devices': 1,
                   'num_examples': 1281167,
                   'one_hot': False,
                   'shuffle_buffer_size': 10000,
                   'skip_decoding': True,
                   'split': 'train',
                   'standardize': True,
                   'tf_data_service': None,
                   'use_per_replica_batch_size': True},
 'validation_dataset': {'augmenter': {'name': None, 'params': None},
                        'batch_size': 128,
                        'builder': 'records',
                        'cache': False,
                        'data_dir': None,
                        'download': False,
                        'dtype': 'float32',
                        'file_shuffle_buffer_size': 1024,
                        'filenames': None,
                        'image_size': 224,
                        'mean_subtract': True,
                        'name': 'imagenet2012',
                        'num_channels': 3,
                        'num_classes': 1000,
                        'num_devices': 1,
                        'num_examples': 1281167,
                        'one_hot': False,
                        'shuffle_buffer_size': 10000,
                        'skip_decoding': True,
                        'split': 'validation',
                        'standardize': True,
                        'tf_data_service': None,
                        'use_per_replica_batch_size': True}}
I0914 19:14:03.683624 140080684554048 classifier_trainer.py:184] Overriding params: configs/examples/resnet/imagenet/gpu.yaml
I0914 19:14:03.690618 140080684554048 classifier_trainer.py:184] Overriding params: runtime.num_gpus=$NUM_GPUS
I0914 19:14:03.691445 140080684554048 classifier_trainer.py:184] Overriding params: {'model_dir': '', 'mode': 'train_and_eval', 'model': {'name': 'resnet'}, 'runtime': {'run_eagerly': None, 'tpu': None}, 'train_dataset': {'data_dir': ''}, 'validation_dataset': {'data_dir': ''}, 'train': {'time_history': {'log_steps': 100}}}
I0914 19:14:03.693601 140080684554048 classifier_trainer.py:190] Final model parameters: {'evaluation': {'epochs_between_evals': 1, 'skip_eval': False, 'steps': None},
 'export': {'checkpoint': None, 'destination': None},
 'mode': 'train_and_eval',
 'model': {'learning_rate': {'boundaries': [30, 60, 80],
                             'decay_epochs': None,
                             'decay_rate': None,
                             'examples_per_epoch': 1281167,
                             'initial_lr': 0.1,
                             'multipliers': [0.000390625,
                                             3.90625e-05,
                                             3.90625e-06,
                                             3.90625e-07],
                             'name': 'stepwise',
                             'scale_by_batch_size': 0.00390625,
                             'staircase': None,
                             'warmup_epochs': 5},
           'loss': {'label_smoothing': 0.1,
                    'name': 'sparse_categorical_crossentropy'},
           'model_params': {'batch_size': None,
                            'num_classes': 1000,
                            'rescale_inputs': False,
                            'use_l2_regularizer': True},
           'name': 'resnet',
           'num_classes': 1000,
           'optimizer': {'beta_1': None,
                         'beta_2': None,
                         'decay': 0.9,
                         'epsilon': 0.001,
                         'lookahead': None,
                         'momentum': 0.9,
                         'moving_average_decay': None,
                         'name': 'momentum',
                         'nesterov': None}},
 'model_dir': '',
 'model_name': None,
 'runtime': {'all_reduce_alg': None,
             'batchnorm_spatial_persistent': True,
             'dataset_num_private_threads': None,
             'default_shard_dim': -1,
             'distribution_strategy': 'mirrored',
             'enable_xla': False,
             'gpu_thread_mode': None,
             'loss_scale': None,
             'mixed_precision_dtype': None,
             'num_cores_per_replica': 1,
             'num_gpus': '$NUM_GPUS',
             'num_packs': 1,
             'per_gpu_thread_count': 0,
             'run_eagerly': None,
             'task_index': -1,
             'tpu': None,
             'tpu_enable_xla_dynamic_padder': None,
             'worker_hosts': None},
 'train': {'callbacks': {'enable_backup_and_restore': False,
                         'enable_checkpoint_and_export': True,
                         'enable_tensorboard': True,
                         'enable_time_history': True},
           'epochs': 90,
           'metrics': ['accuracy', 'top_5'],
           'resume_checkpoint': True,
           'set_epoch_loop': False,
           'steps': None,
           'tensorboard': {'track_lr': True, 'write_model_weights': False},
           'time_history': {'log_steps': 100}},
 'train_dataset': {'augmenter': {'name': None, 'params': None},
                   'batch_size': 256,
                   'builder': 'tfds',
                   'cache': False,
                   'data_dir': '',
                   'download': False,
                   'dtype': 'float16',
                   'file_shuffle_buffer_size': 1024,
                   'filenames': None,
                   'image_size': 224,
                   'mean_subtract': True,
                   'name': 'imagenet2012',
                   'num_channels': 3,
                   'num_classes': 1000,
                   'num_devices': 1,
                   'num_examples': 1281167,
                   'one_hot': False,
                   'shuffle_buffer_size': 10000,
                   'skip_decoding': True,
                   'split': 'train',
                   'standardize': True,
                   'tf_data_service': None,
                   'use_per_replica_batch_size': True},
 'validation_dataset': {'augmenter': {'name': None, 'params': None},
                        'batch_size': 256,
                        'builder': 'tfds',
                        'cache': False,
                        'data_dir': '',
                        'download': False,
                        'dtype': 'float16',
                        'file_shuffle_buffer_size': 1024,
                        'filenames': None,
                        'image_size': 224,
                        'mean_subtract': True,
                        'name': 'imagenet2012',
                        'num_channels': 3,
                        'num_classes': 1000,
                        'num_devices': 1,
                        'num_examples': 50000,
                        'one_hot': False,
                        'shuffle_buffer_size': 10000,
                        'skip_decoding': True,
                        'split': 'validation',
                        'standardize': True,
                        'tf_data_service': None,
                        'use_per_replica_batch_size': True}}
I0914 19:14:03.694338 140080684554048 classifier_trainer.py:290] Running train and eval.
Traceback (most recent call last):
  File "classifier_trainer.py", line 456, in <module>
    app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "classifier_trainer.py", line 443, in main
    stats = run(flags.FLAGS)
  File "classifier_trainer.py", line 435, in run
    return train_and_eval(params, strategy_override)
  File "classifier_trainer.py", line 300, in train_and_eval
    tpu_address=params.runtime.tpu)
  File "/usr/local/lib/python3.6/dist-packages/official/common/distribute_utils.py", line 129, in get_distribution_strategy
    if num_gpus < 0:
TypeError: '<' not supported between instances of 'str' and 'int'
[email protected]:/usr/local/lib/python3.6/dist-packages/official/vision/image_classification# pp.py", line 251, in _run_main
>     sys.exit(main(argv))
>   File "classifier_trainer.py", line 443, in main
>     stats = run(flags.FLAGS)
>   File "classifier_trainer.py", line 435, in run
>     return train_and_eval(params, strategy_override)
>   File "classifier_trainer.py", line 300, in train_and_eval
>     tpu_address=params.runtime.tpu)
>   File "/usr/local/lib/python3.6/dist-packages/official/common/distribute_utils.py", line 129, in get_distribution_strategy
>     if num_gpus < 0:
> TypeError: '<' not supported between instances of 'str' and 'int'

Changing the command to:

python3 classifier_trainer.py \
  --mode=train_and_eval \
  --model_type=resnet \
  --dataset=imagenet \
  --model_dir=$MODEL_DIR \
  --data_dir=$DATA_DIR \
  --config_file=configs/examples/resnet/imagenet/gpu.yaml \
  --params_override='runtime.num_gpus=1

I'm getting:

2021-09-14 19:16:45.876311: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-14 19:16:45.877754: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I0914 19:16:45.886160 139841922975552 classifier_trainer.py:181] Base params: {'evaluation': {'epochs_between_evals': 1, 'skip_eval': False, 'steps': None},
 'export': {'checkpoint': None, 'destination': None},
 'mode': None,
 'model': {'learning_rate': {'boundaries': [30, 60, 80],
                             'decay_epochs': None,
                             'decay_rate': None,
                             'examples_per_epoch': 1281167,
                             'initial_lr': 0.1,
                             'multipliers': [0.000390625,
                                             3.90625e-05,
                                             3.90625e-06,
                                             3.90625e-07],
                             'name': 'stepwise',
                             'scale_by_batch_size': 0.00390625,
                             'staircase': None,
                             'warmup_epochs': 5},
           'loss': {'label_smoothing': None,
                    'name': 'sparse_categorical_crossentropy'},
           'model_params': {'batch_size': None,
                            'num_classes': 1000,
                            'rescale_inputs': False,
                            'use_l2_regularizer': True},
           'name': 'ResNet',
           'num_classes': 1000,
           'optimizer': {'beta_1': None,
                         'beta_2': None,
                         'decay': 0.9,
                         'epsilon': 0.001,
                         'lookahead': None,
                         'momentum': 0.9,
                         'moving_average_decay': None,
                         'name': 'momentum',
                         'nesterov': None}},
 'model_dir': None,
 'model_name': None,
 'runtime': {'all_reduce_alg': None,
             'batchnorm_spatial_persistent': False,
             'dataset_num_private_threads': None,
             'default_shard_dim': -1,
             'distribution_strategy': 'mirrored',
             'enable_xla': False,
             'gpu_thread_mode': None,
             'loss_scale': None,
             'mixed_precision_dtype': None,
             'num_cores_per_replica': 1,
             'num_gpus': 0,
             'num_packs': 1,
             'per_gpu_thread_count': 0,
             'run_eagerly': False,
             'task_index': -1,
             'tpu': None,
             'tpu_enable_xla_dynamic_padder': None,
             'worker_hosts': None},
 'train': {'callbacks': {'enable_backup_and_restore': False,
                         'enable_checkpoint_and_export': True,
                         'enable_tensorboard': True,
                         'enable_time_history': True},
           'epochs': 90,
           'metrics': ['accuracy', 'top_5'],
           'resume_checkpoint': True,
           'set_epoch_loop': False,
           'steps': None,
           'tensorboard': {'track_lr': True, 'write_model_weights': False},
           'time_history': {'log_steps': 100}},
 'train_dataset': {'augmenter': {'name': None, 'params': None},
                   'batch_size': 128,
                   'builder': 'records',
                   'cache': False,
                   'data_dir': None,
                   'download': False,
                   'dtype': 'float32',
                   'file_shuffle_buffer_size': 1024,
                   'filenames': None,
                   'image_size': 224,
                   'mean_subtract': True,
                   'name': 'imagenet2012',
                   'num_channels': 3,
                   'num_classes': 1000,
                   'num_devices': 1,
                   'num_examples': 1281167,
                   'one_hot': False,
                   'shuffle_buffer_size': 10000,
                   'skip_decoding': True,
                   'split': 'train',
                   'standardize': True,
                   'tf_data_service': None,
                   'use_per_replica_batch_size': True},
 'validation_dataset': {'augmenter': {'name': None, 'params': None},
                        'batch_size': 128,
                        'builder': 'records',
                        'cache': False,
                        'data_dir': None,
                        'download': False,
                        'dtype': 'float32',
                        'file_shuffle_buffer_size': 1024,
                        'filenames': None,
                        'image_size': 224,
                        'mean_subtract': True,
                        'name': 'imagenet2012',
                        'num_channels': 3,
                        'num_classes': 1000,
                        'num_devices': 1,
                        'num_examples': 1281167,
                        'one_hot': False,
                        'shuffle_buffer_size': 10000,
                        'skip_decoding': True,
                        'split': 'validation',
                        'standardize': True,
                        'tf_data_service': None,
                        'use_per_replica_batch_size': True}}
I0914 19:16:45.887840 139841922975552 classifier_trainer.py:184] Overriding params: configs/examples/resnet/imagenet/gpu.yaml
I0914 19:16:45.897558 139841922975552 classifier_trainer.py:184] Overriding params: runtime.num_gpus=1
I0914 19:16:45.898580 139841922975552 classifier_trainer.py:184] Overriding params: {'model_dir': '', 'mode': 'train_and_eval', 'model': {'name': 'resnet'}, 'runtime': {'run_eagerly': None, 'tpu': None}, 'train_dataset': {'data_dir': ''}, 'validation_dataset': {'data_dir': ''}, 'train': {'time_history': {'log_steps': 100}}}
I0914 19:16:45.901514 139841922975552 classifier_trainer.py:190] Final model parameters: {'evaluation': {'epochs_between_evals': 1, 'skip_eval': False, 'steps': None},
 'export': {'checkpoint': None, 'destination': None},
 'mode': 'train_and_eval',
 'model': {'learning_rate': {'boundaries': [30, 60, 80],
                             'decay_epochs': None,
                             'decay_rate': None,
                             'examples_per_epoch': 1281167,
                             'initial_lr': 0.1,
                             'multipliers': [0.000390625,
                                             3.90625e-05,
                                             3.90625e-06,
                                             3.90625e-07],
                             'name': 'stepwise',
                             'scale_by_batch_size': 0.00390625,
                             'staircase': None,
                             'warmup_epochs': 5},
           'loss': {'label_smoothing': 0.1,
                    'name': 'sparse_categorical_crossentropy'},
           'model_params': {'batch_size': None,
                            'num_classes': 1000,
                            'rescale_inputs': False,
                            'use_l2_regularizer': True},
           'name': 'resnet',
           'num_classes': 1000,
           'optimizer': {'beta_1': None,
                         'beta_2': None,
                         'decay': 0.9,
                         'epsilon': 0.001,
                         'lookahead': None,
                         'momentum': 0.9,
                         'moving_average_decay': None,
                         'name': 'momentum',
                         'nesterov': None}},
 'model_dir': '',
 'model_name': None,
 'runtime': {'all_reduce_alg': None,
             'batchnorm_spatial_persistent': True,
             'dataset_num_private_threads': None,
             'default_shard_dim': -1,
             'distribution_strategy': 'mirrored',
             'enable_xla': False,
             'gpu_thread_mode': None,
             'loss_scale': None,
             'mixed_precision_dtype': None,
             'num_cores_per_replica': 1,
             'num_gpus': 1,
             'num_packs': 1,
             'per_gpu_thread_count': 0,
             'run_eagerly': None,
             'task_index': -1,
             'tpu': None,
             'tpu_enable_xla_dynamic_padder': None,
             'worker_hosts': None},
 'train': {'callbacks': {'enable_backup_and_restore': False,
                         'enable_checkpoint_and_export': True,
                         'enable_tensorboard': True,
                         'enable_time_history': True},
           'epochs': 90,
           'metrics': ['accuracy', 'top_5'],
           'resume_checkpoint': True,
           'set_epoch_loop': False,
           'steps': None,
           'tensorboard': {'track_lr': True, 'write_model_weights': False},
           'time_history': {'log_steps': 100}},
 'train_dataset': {'augmenter': {'name': None, 'params': None},
                   'batch_size': 256,
                   'builder': 'tfds',
                   'cache': False,
                   'data_dir': '',
                   'download': False,
                   'dtype': 'float16',
                   'file_shuffle_buffer_size': 1024,
                   'filenames': None,
                   'image_size': 224,
                   'mean_subtract': True,
                   'name': 'imagenet2012',
                   'num_channels': 3,
                   'num_classes': 1000,
                   'num_devices': 1,
                   'num_examples': 1281167,
                   'one_hot': False,
                   'shuffle_buffer_size': 10000,
                   'skip_decoding': True,
                   'split': 'train',
                   'standardize': True,
                   'tf_data_service': None,
                   'use_per_replica_batch_size': True},
 'validation_dataset': {'augmenter': {'name': None, 'params': None},
                        'batch_size': 256,
                        'builder': 'tfds',
                        'cache': False,
                        'data_dir': '',
                        'download': False,
                        'dtype': 'float16',
                        'file_shuffle_buffer_size': 1024,
                        'filenames': None,
                        'image_size': 224,
                        'mean_subtract': True,
                        'name': 'imagenet2012',
                        'num_channels': 3,
                        'num_classes': 1000,
                        'num_devices': 1,
                        'num_examples': 50000,
                        'one_hot': False,
                        'shuffle_buffer_size': 10000,
                        'skip_decoding': True,
                        'split': 'validation',
                        'standardize': True,
                        'tf_data_service': None,
                        'use_per_replica_batch_size': True}}
I0914 19:16:45.901775 139841922975552 classifier_trainer.py:290] Running train and eval.
2021-09-14 19:16:45.903073: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-14 19:16:45.903737: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-14 19:16:45.905226: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-14 19:16:45.906560: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-14 19:16:47.033764: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-14 19:16:47.034708: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-14 19:16:47.035731: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-14 19:16:47.036855: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30995 MB memory:  -> device: 0, name: Tesla V100-PCIE-32GB, pci bus id: 0000:00:07.0, compute capability: 7.0
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
I0914 19:16:47.995822 139841922975552 mirrored_strategy.py:369] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
I0914 19:16:47.997327 139841922975552 classifier_trainer.py:305] Detected 1 devices.
W0914 19:16:47.997431 139841922975552 classifier_trainer.py:105] label_smoothing > 0, so datasets will be one hot encoded.
I0914 19:16:47.997728 139841922975552 dataset_factory.py:176] Using augmentation: None
I0914 19:16:47.998001 139841922975552 dataset_factory.py:176] Using augmentation: None
I0914 19:16:47.998326 139841922975552 dataset_factory.py:341] Using TFDS to load data.
2021-09-14 19:16:48.004060: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "Not found: Could not locate the credentials file.". Retrieving token from GCE failed with "Failed precondition: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata".
I0914 19:16:48.669551 139841922975552 dataset_info.py:443] Load pre-computed DatasetInfo (eg: splits, num examples,...) from GCS: imagenet2012/5.1.0
I0914 19:16:49.586393 139841922975552 dataset_info.py:358] Load dataset info from /tmp/tmp1h_9dllttfds
I0914 19:16:49.595921 139841922975552 dataset_info.py:413] Field info.description from disk and from code do not match. Keeping the one from code.
I0914 19:16:49.596311 139841922975552 dataset_info.py:413] Field info.module_name from disk and from code do not match. Keeping the one from code.
I0914 19:16:49.596735 139841922975552 logging_logger.py:36] Constructing tf.data.Dataset imagenet2012 for split train, from /root/tensorflow_datasets/imagenet2012/5.1.0
Traceback (most recent call last):
  File "classifier_trainer.py", line 456, in <module>
    app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "classifier_trainer.py", line 443, in main
    stats = run(flags.FLAGS)
  File "classifier_trainer.py", line 435, in run
    return train_and_eval(params, strategy_override)
  File "classifier_trainer.py", line 312, in train_and_eval
    builder.build(strategy) if builder else None for builder in builders
  File "classifier_trainer.py", line 312, in <listcomp>
    builder.build(strategy) if builder else None for builder in builders
  File "/usr/local/lib/python3.6/dist-packages/official/vision/image_classification/dataset_factory.py", line 302, in build
    dataset = strategy.distribute_datasets_from_function(self._build)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 1161, in distribute_datasets_from_function
    dataset_fn, options)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/mirrored_strategy.py", line 589, in _distribute_datasets_from_function
    options)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/input_lib.py", line 169, in get_distributed_datasets_from_function
    input_contexts, dataset_fn, options)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/input_lib.py", line 1579, in __init__
    input_contexts, self._input_workers, dataset_fn))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/input_lib.py", line 2327, in _create_datasets_from_function_with_input_context
    dataset = dataset_fn(ctx)
  File "/usr/local/lib/python3.6/dist-packages/official/vision/image_classification/dataset_factory.py", line 333, in _build
    dataset = builder()
  File "/usr/local/lib/python3.6/dist-packages/official/vision/image_classification/dataset_factory.py", line 363, in load_tfds
    read_config=read_config)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_datasets/core/logging/__init__.py", line 81, in decorator
    return function(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_datasets/core/dataset_builder.py", line 546, in as_dataset
    (self.name, self._data_dir_root))
AssertionError: Dataset imagenet2012: could not find data in /root/tensorflow_datasets. Please make sure to call dataset_builder.download_and_prepare(), or pass download=True to tfds.load() before trying to access the tf.data.Dataset object.

I don't know how to fix this, I think documentation is not very clear, please help. I wouldn't mind to use the new code base in beta, but there is even less documentation.

jackd
jackd

@esparig I found using tfds much easier. Downloading the train/validation data to ~/tensorflow_datasets/downloads/manual (or $TFDS_DATA_DIR/downloads/manual, create the tfds-override.yaml file below (global_batch_size included to demonstrate reduced memory usage) and run with

python $OFFICIAL/vision/beta/train.py \
    --experiment=resnet_imagenet \
    --config_file=$CONFIGS/experiments/image_classification/imagenet_resnet50_gpu.yaml \
    --mode=train_and_eval \
    --model_dir=/tmp/foo \
    --params_override='tfds-override.yaml'

tfds-override.yaml

task:
  train_data:
    input_path: ''
    tfds_name: 'imagenet2012'
    tfds_split: 'train'
    global_batch_size: 2
  validation_data:
    input_path: ''
    tfds_name: 'imagenet2012'
    tfds_split: 'validation'
    global_batch_size: 2

From memory I had to update tensorflow-datasets to the latest stable release.

Sep
21
2 months ago
Activity icon
issue

jackd issue comment tensorflow/tensorflow

jackd
jackd

Problem with tf.data.Dataset managing shapes of sparse tensors

Please go to Stack Overflow for help and support:

https://stackoverflow.com/questions/tagged/tensorflow

If you open a GitHub issue, here is our policy:

  1. It must be a bug or a feature request.
  2. The form below must be filled out.
  3. It shouldn't be a TensorBoard issue. Those go here.

Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.


System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): 16.04.3 LTS (Xenial Xerus)
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 1.5.0-dev20171224
  • Python version: 3.6
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: N/A
  • GPU model and memory: N/A

Describe the problem

The tf-1.5-dev supports tf.SparseTensor when using tf.data.Dataset.from_tensor_slices, but it cannot infer the shape of new tensor after some operations such as tf.data.Dataset.map.

The shape of tensor becomes Unknown, which is troublesome for downstream operations. For example, we have to call set_shape() if we want to feed the new tensor into a tf.layers.dense.

Source code / logs

import tensorflow as tf
x = tf.SparseTensor([[0,0],[1,1],[2,2]], [1,1,1], dense_shape=[3,3])
ds = tf.data.Dataset.from_tensor_slices(x)
ds.output_shapes   # TensorShape([Dimension(3)])
ds = ds.map(lambda x: tf.sparse_tensor_to_dense(x))
ds = ds.batch(1)
ds.output_shapes   # TensorShape([Dimension(None), Dimension(None)])

iterator = ds.make_one_shot_iterator()
next_elem = iterator.get_next()   # TensorShape([Dimension(None), Dimension(None)])

y = tf.layers.dense(next_elem, 100)
ValueError: The last dimension of the inputs to `Dense` should be defined. Found `None`.

@mrry

jackd
jackd

I know this is an old issue, but is there any progress towards official support for SparseTensor.set_shape?

Sep
8
2 months ago
Activity icon
issue

jackd issue comment jackd/numba-neighbors

jackd
jackd

What are the benchmarking results like?

Any output or images you could include in the README or in the benchmark folder?

jackd
jackd

Pushed a quick update, but with 36 hours to AAAI deadline I might have to get back to this later. Would be happy to overhaul benchmarks (big fan of google-benchmark now), just don't have the spare cycles atm.

Sep
7
2 months ago
push

jackd push jackd/numba-neighbors

jackd
jackd

Updated sklearn version, added benchmarks to README

commit sha: 613fcc9be3a4050f23eb1fa319ea16b6848dc754

push time in 2 months ago
Activity icon
issue

jackd issue comment jackd/numba-neighbors

jackd
jackd

Suggestions for implementing custom distance metric

Any suggestions for how to go about this? (I.e. how feasible you think it is and where in the code you think it would go).

jackd
jackd

Should be easy enough - I've done an optimised version for 3D trees here since it seemed jit did a better job of this than with a loop. The class structure / methods-as-properties are a bit convoluted, but it was the best way I could find of getting around jit restrictions. It's been a while since I've looked at this, but let me know if you have trouble and I can look at it again.