How do I create a debug build of a recent Tensorflow version with CUDA Support?
up vote
0
down vote
favorite
I tried and tried to create a debug build for a recent version of Tensorflow , using the official docker images (latest-cuda-devel-py3 -> r1.12.0) but nothing seems to work. Has someone recently created a successful debug build for Tensorflow (>= r1.11.0) and can share his approach ?
This is what I tried so far.
I basically tried to follow the instructions at https://www.tensorflow.org/install/source, but tried to modify them to generate a debug build. Nothing I tried resulted in a successful build.
The Host System is a Linux x86-64 machine with lots of RAM (e.g. 512 GB of RAM -> DGX-1). The CUDA Version within the Docker-Image is CUDA-9.0. The recent "latest" Tensorflow Version which is inside the docker image is r1.12.0
In order to get any cuda-build working, I needed to use "nvidia-docker", otherwise I get a linker error with "libcuda.so.1".
I started like this:
nvidia-docker pull tensorflow/tensorflow:latest-devel-gpu-py3
nvidia-docker run --runtime=nvidia -it -w /tensorflow -v $PWD:/mnt -e HOST_PERMS="$(id -u):$(id -g)"
tensorflow/tensorflow:latest-devel-gpu-py3 bash
Then I tried to configure the project using
cd /tensorflow
./configure
I tried various configs. I tried keeping all values at their defaults. I tried enabling only the parts which I need. I tried not running ./configure at all. I pointed it to my own cuda-9.0 and tensorrt installtion. But not running ./configure at all (in the docker image) seems to produce best results (e.g. I can do optimized builds successfully with least effort).
If I build it using the exact official build instructions, i.e. creating an optimized/non-debug build, everything works as expected. So running the following seems to succeed.
bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
Same thing, if I run the following, which includes debug info, but does not turn off optimization (e.g. I cannot really use this for debug purposes).
bazel build --config cuda --strip=never -c opt --copt="-ggdb" //tensorflow/tools/pip_package:build_pip_package
But everything which disables optimizations does not seem to work. If I run the following (with or without the --strip=never flag )
bazel build --config cuda --strip=never -c dbg
//tensorflow/tools/pip_package:build_pip_package
I arrive at the following error:
INFO: From Compiling
tensorflow/contrib/framework/kernels/zero_initializer_op_gpu.cu.cc:
external/com_google_absl/absl/strings/string_view.h(496): error:
constexpr function return is non-constant
Which can be resolved by defining -DNDEBUG (see nvcc error: string_view.h: constexpr function return is non-constant ).
But If I run the following:
bazel build --config cuda --strip=never -c dbg --copt="-DNDEBUG" //tensorflow/tools/pip_package:build_pip_package
I get these linking errors at the final step of the build:
ERROR:
/tensorflow/python/BUILD:3865:1:
Linking of rule '//tensorflow/python:_pywrap_tensorflow_internal.so'
failed (Exit 1)
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o: In
function_init': (.init+0x7): relocation truncated to fit:
gmon_start'
R_X86_64_REX_GOTPCRELX against undefined symbol
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function
deregister_tm_clones': crtstuff.c:(.text+0x3): relocation truncated
.tm_clone_table'
to fit: R_X86_64_PC32 against
crtstuff.c:(.text+0xa): relocation truncated to fit: R_X86_64_PC32
against symbol__TMC_END__' defined in .nvFatBinSegment section in
_ITM_deregisterTMCloneTable'
bazel-out/k8-dbg/bin/tensorflow/python/_pywrap_tensorflow_internal.so
crtstuff.c:(.text+0x1e): relocation truncated to fit:
R_X86_64_REX_GOTPCRELX against undefined symbol
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function
register_tm_clones': crtstuff.c:(.text+0x43): relocation truncated to
.tm_clone_table' crtstuff.c:(.text+0x4a):
fit: R_X86_64_PC32 against
relocation truncated to fit: R_X86_64_PC32 against symbol
__TMC_END__' defined in .nvFatBinSegment section in
_ITM_registerTMCloneTable'
bazel-out/k8-dbg/bin/tensorflow/python/_pywrap_tensorflow_internal.so
crtstuff.c:(.text+0x6b): relocation truncated to fit:
R_X86_64_REX_GOTPCRELX against undefined symbol
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function
__do_global_dtors_aux': crtstuff.c:(.text+0x92): relocation truncated
.bss' crtstuff.c:(.text+0x9c):
to fit: R_X86_64_PC32 against
relocation truncated to fit: R_X86_64_GOTPCREL against symbol
__cxa_finalize@@GLIBC_2.2.5' defined in .text section in
__dso_handle' defined
/lib/x86_64-linux-gnu/libc.so.6 crtstuff.c:(.text+0xaa): relocation
truncated to fit: R_X86_64_PC32 against symbol
in .data.rel.local section in
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o crtstuff.c:(.text+0xbb):
additional relocation overflows omitted from the output
bazel-out/k8-dbg/bin/tensorflow/python/_pywrap_tensorflow_internal.so:
PC-relative offset overflow in GOT PLT entry for
`_ZNK5Eigen10TensorBaseINS_9TensorMapINS_6TensorIKjLi1ELi1EiEELi16ENS_11MakePointerEEELi0EE9unaryExprINS_8internal11scalar_leftIjjN10tensorflow7functor14right_shift_opIjEEEEEEKNS_18TensorCwiseUnaryOpIT_KS6_EERKSH_'
collect2: error: ld returned 1 exit status Target
//tensorflow/tools/pip_package:build_pip_package failed to build
I hoped to be able to solve that by doing a monolithic build. So I tried that, and got essentially the same error.
bazel build --config cuda -c dbg --config=monolithic --copt="-DNDEBUG" //tensorflow/tools/pip_package:build_pip_package
I also tried the approaches from TensorFlow doesnt build with debug mode and several other variants I found by extensive googling. I'm running out of options.
I'd take any Tensorflow version from 1.11 onwards, including (working) nightly builds. It just needs to work with CUDA 9 on x86 linux, include debug symbols and disabled optimizations.
thank you very much in Advance..
c++ tensorflow build bazel debug-symbols
add a comment |
up vote
0
down vote
favorite
I tried and tried to create a debug build for a recent version of Tensorflow , using the official docker images (latest-cuda-devel-py3 -> r1.12.0) but nothing seems to work. Has someone recently created a successful debug build for Tensorflow (>= r1.11.0) and can share his approach ?
This is what I tried so far.
I basically tried to follow the instructions at https://www.tensorflow.org/install/source, but tried to modify them to generate a debug build. Nothing I tried resulted in a successful build.
The Host System is a Linux x86-64 machine with lots of RAM (e.g. 512 GB of RAM -> DGX-1). The CUDA Version within the Docker-Image is CUDA-9.0. The recent "latest" Tensorflow Version which is inside the docker image is r1.12.0
In order to get any cuda-build working, I needed to use "nvidia-docker", otherwise I get a linker error with "libcuda.so.1".
I started like this:
nvidia-docker pull tensorflow/tensorflow:latest-devel-gpu-py3
nvidia-docker run --runtime=nvidia -it -w /tensorflow -v $PWD:/mnt -e HOST_PERMS="$(id -u):$(id -g)"
tensorflow/tensorflow:latest-devel-gpu-py3 bash
Then I tried to configure the project using
cd /tensorflow
./configure
I tried various configs. I tried keeping all values at their defaults. I tried enabling only the parts which I need. I tried not running ./configure at all. I pointed it to my own cuda-9.0 and tensorrt installtion. But not running ./configure at all (in the docker image) seems to produce best results (e.g. I can do optimized builds successfully with least effort).
If I build it using the exact official build instructions, i.e. creating an optimized/non-debug build, everything works as expected. So running the following seems to succeed.
bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
Same thing, if I run the following, which includes debug info, but does not turn off optimization (e.g. I cannot really use this for debug purposes).
bazel build --config cuda --strip=never -c opt --copt="-ggdb" //tensorflow/tools/pip_package:build_pip_package
But everything which disables optimizations does not seem to work. If I run the following (with or without the --strip=never flag )
bazel build --config cuda --strip=never -c dbg
//tensorflow/tools/pip_package:build_pip_package
I arrive at the following error:
INFO: From Compiling
tensorflow/contrib/framework/kernels/zero_initializer_op_gpu.cu.cc:
external/com_google_absl/absl/strings/string_view.h(496): error:
constexpr function return is non-constant
Which can be resolved by defining -DNDEBUG (see nvcc error: string_view.h: constexpr function return is non-constant ).
But If I run the following:
bazel build --config cuda --strip=never -c dbg --copt="-DNDEBUG" //tensorflow/tools/pip_package:build_pip_package
I get these linking errors at the final step of the build:
ERROR:
/tensorflow/python/BUILD:3865:1:
Linking of rule '//tensorflow/python:_pywrap_tensorflow_internal.so'
failed (Exit 1)
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o: In
function_init': (.init+0x7): relocation truncated to fit:
gmon_start'
R_X86_64_REX_GOTPCRELX against undefined symbol
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function
deregister_tm_clones': crtstuff.c:(.text+0x3): relocation truncated
.tm_clone_table'
to fit: R_X86_64_PC32 against
crtstuff.c:(.text+0xa): relocation truncated to fit: R_X86_64_PC32
against symbol__TMC_END__' defined in .nvFatBinSegment section in
_ITM_deregisterTMCloneTable'
bazel-out/k8-dbg/bin/tensorflow/python/_pywrap_tensorflow_internal.so
crtstuff.c:(.text+0x1e): relocation truncated to fit:
R_X86_64_REX_GOTPCRELX against undefined symbol
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function
register_tm_clones': crtstuff.c:(.text+0x43): relocation truncated to
.tm_clone_table' crtstuff.c:(.text+0x4a):
fit: R_X86_64_PC32 against
relocation truncated to fit: R_X86_64_PC32 against symbol
__TMC_END__' defined in .nvFatBinSegment section in
_ITM_registerTMCloneTable'
bazel-out/k8-dbg/bin/tensorflow/python/_pywrap_tensorflow_internal.so
crtstuff.c:(.text+0x6b): relocation truncated to fit:
R_X86_64_REX_GOTPCRELX against undefined symbol
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function
__do_global_dtors_aux': crtstuff.c:(.text+0x92): relocation truncated
.bss' crtstuff.c:(.text+0x9c):
to fit: R_X86_64_PC32 against
relocation truncated to fit: R_X86_64_GOTPCREL against symbol
__cxa_finalize@@GLIBC_2.2.5' defined in .text section in
__dso_handle' defined
/lib/x86_64-linux-gnu/libc.so.6 crtstuff.c:(.text+0xaa): relocation
truncated to fit: R_X86_64_PC32 against symbol
in .data.rel.local section in
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o crtstuff.c:(.text+0xbb):
additional relocation overflows omitted from the output
bazel-out/k8-dbg/bin/tensorflow/python/_pywrap_tensorflow_internal.so:
PC-relative offset overflow in GOT PLT entry for
`_ZNK5Eigen10TensorBaseINS_9TensorMapINS_6TensorIKjLi1ELi1EiEELi16ENS_11MakePointerEEELi0EE9unaryExprINS_8internal11scalar_leftIjjN10tensorflow7functor14right_shift_opIjEEEEEEKNS_18TensorCwiseUnaryOpIT_KS6_EERKSH_'
collect2: error: ld returned 1 exit status Target
//tensorflow/tools/pip_package:build_pip_package failed to build
I hoped to be able to solve that by doing a monolithic build. So I tried that, and got essentially the same error.
bazel build --config cuda -c dbg --config=monolithic --copt="-DNDEBUG" //tensorflow/tools/pip_package:build_pip_package
I also tried the approaches from TensorFlow doesnt build with debug mode and several other variants I found by extensive googling. I'm running out of options.
I'd take any Tensorflow version from 1.11 onwards, including (working) nightly builds. It just needs to work with CUDA 9 on x86 linux, include debug symbols and disabled optimizations.
thank you very much in Advance..
c++ tensorflow build bazel debug-symbols
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I tried and tried to create a debug build for a recent version of Tensorflow , using the official docker images (latest-cuda-devel-py3 -> r1.12.0) but nothing seems to work. Has someone recently created a successful debug build for Tensorflow (>= r1.11.0) and can share his approach ?
This is what I tried so far.
I basically tried to follow the instructions at https://www.tensorflow.org/install/source, but tried to modify them to generate a debug build. Nothing I tried resulted in a successful build.
The Host System is a Linux x86-64 machine with lots of RAM (e.g. 512 GB of RAM -> DGX-1). The CUDA Version within the Docker-Image is CUDA-9.0. The recent "latest" Tensorflow Version which is inside the docker image is r1.12.0
In order to get any cuda-build working, I needed to use "nvidia-docker", otherwise I get a linker error with "libcuda.so.1".
I started like this:
nvidia-docker pull tensorflow/tensorflow:latest-devel-gpu-py3
nvidia-docker run --runtime=nvidia -it -w /tensorflow -v $PWD:/mnt -e HOST_PERMS="$(id -u):$(id -g)"
tensorflow/tensorflow:latest-devel-gpu-py3 bash
Then I tried to configure the project using
cd /tensorflow
./configure
I tried various configs. I tried keeping all values at their defaults. I tried enabling only the parts which I need. I tried not running ./configure at all. I pointed it to my own cuda-9.0 and tensorrt installtion. But not running ./configure at all (in the docker image) seems to produce best results (e.g. I can do optimized builds successfully with least effort).
If I build it using the exact official build instructions, i.e. creating an optimized/non-debug build, everything works as expected. So running the following seems to succeed.
bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
Same thing, if I run the following, which includes debug info, but does not turn off optimization (e.g. I cannot really use this for debug purposes).
bazel build --config cuda --strip=never -c opt --copt="-ggdb" //tensorflow/tools/pip_package:build_pip_package
But everything which disables optimizations does not seem to work. If I run the following (with or without the --strip=never flag )
bazel build --config cuda --strip=never -c dbg
//tensorflow/tools/pip_package:build_pip_package
I arrive at the following error:
INFO: From Compiling
tensorflow/contrib/framework/kernels/zero_initializer_op_gpu.cu.cc:
external/com_google_absl/absl/strings/string_view.h(496): error:
constexpr function return is non-constant
Which can be resolved by defining -DNDEBUG (see nvcc error: string_view.h: constexpr function return is non-constant ).
But If I run the following:
bazel build --config cuda --strip=never -c dbg --copt="-DNDEBUG" //tensorflow/tools/pip_package:build_pip_package
I get these linking errors at the final step of the build:
ERROR:
/tensorflow/python/BUILD:3865:1:
Linking of rule '//tensorflow/python:_pywrap_tensorflow_internal.so'
failed (Exit 1)
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o: In
function_init': (.init+0x7): relocation truncated to fit:
gmon_start'
R_X86_64_REX_GOTPCRELX against undefined symbol
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function
deregister_tm_clones': crtstuff.c:(.text+0x3): relocation truncated
.tm_clone_table'
to fit: R_X86_64_PC32 against
crtstuff.c:(.text+0xa): relocation truncated to fit: R_X86_64_PC32
against symbol__TMC_END__' defined in .nvFatBinSegment section in
_ITM_deregisterTMCloneTable'
bazel-out/k8-dbg/bin/tensorflow/python/_pywrap_tensorflow_internal.so
crtstuff.c:(.text+0x1e): relocation truncated to fit:
R_X86_64_REX_GOTPCRELX against undefined symbol
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function
register_tm_clones': crtstuff.c:(.text+0x43): relocation truncated to
.tm_clone_table' crtstuff.c:(.text+0x4a):
fit: R_X86_64_PC32 against
relocation truncated to fit: R_X86_64_PC32 against symbol
__TMC_END__' defined in .nvFatBinSegment section in
_ITM_registerTMCloneTable'
bazel-out/k8-dbg/bin/tensorflow/python/_pywrap_tensorflow_internal.so
crtstuff.c:(.text+0x6b): relocation truncated to fit:
R_X86_64_REX_GOTPCRELX against undefined symbol
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function
__do_global_dtors_aux': crtstuff.c:(.text+0x92): relocation truncated
.bss' crtstuff.c:(.text+0x9c):
to fit: R_X86_64_PC32 against
relocation truncated to fit: R_X86_64_GOTPCREL against symbol
__cxa_finalize@@GLIBC_2.2.5' defined in .text section in
__dso_handle' defined
/lib/x86_64-linux-gnu/libc.so.6 crtstuff.c:(.text+0xaa): relocation
truncated to fit: R_X86_64_PC32 against symbol
in .data.rel.local section in
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o crtstuff.c:(.text+0xbb):
additional relocation overflows omitted from the output
bazel-out/k8-dbg/bin/tensorflow/python/_pywrap_tensorflow_internal.so:
PC-relative offset overflow in GOT PLT entry for
`_ZNK5Eigen10TensorBaseINS_9TensorMapINS_6TensorIKjLi1ELi1EiEELi16ENS_11MakePointerEEELi0EE9unaryExprINS_8internal11scalar_leftIjjN10tensorflow7functor14right_shift_opIjEEEEEEKNS_18TensorCwiseUnaryOpIT_KS6_EERKSH_'
collect2: error: ld returned 1 exit status Target
//tensorflow/tools/pip_package:build_pip_package failed to build
I hoped to be able to solve that by doing a monolithic build. So I tried that, and got essentially the same error.
bazel build --config cuda -c dbg --config=monolithic --copt="-DNDEBUG" //tensorflow/tools/pip_package:build_pip_package
I also tried the approaches from TensorFlow doesnt build with debug mode and several other variants I found by extensive googling. I'm running out of options.
I'd take any Tensorflow version from 1.11 onwards, including (working) nightly builds. It just needs to work with CUDA 9 on x86 linux, include debug symbols and disabled optimizations.
thank you very much in Advance..
c++ tensorflow build bazel debug-symbols
I tried and tried to create a debug build for a recent version of Tensorflow , using the official docker images (latest-cuda-devel-py3 -> r1.12.0) but nothing seems to work. Has someone recently created a successful debug build for Tensorflow (>= r1.11.0) and can share his approach ?
This is what I tried so far.
I basically tried to follow the instructions at https://www.tensorflow.org/install/source, but tried to modify them to generate a debug build. Nothing I tried resulted in a successful build.
The Host System is a Linux x86-64 machine with lots of RAM (e.g. 512 GB of RAM -> DGX-1). The CUDA Version within the Docker-Image is CUDA-9.0. The recent "latest" Tensorflow Version which is inside the docker image is r1.12.0
In order to get any cuda-build working, I needed to use "nvidia-docker", otherwise I get a linker error with "libcuda.so.1".
I started like this:
nvidia-docker pull tensorflow/tensorflow:latest-devel-gpu-py3
nvidia-docker run --runtime=nvidia -it -w /tensorflow -v $PWD:/mnt -e HOST_PERMS="$(id -u):$(id -g)"
tensorflow/tensorflow:latest-devel-gpu-py3 bash
Then I tried to configure the project using
cd /tensorflow
./configure
I tried various configs. I tried keeping all values at their defaults. I tried enabling only the parts which I need. I tried not running ./configure at all. I pointed it to my own cuda-9.0 and tensorrt installtion. But not running ./configure at all (in the docker image) seems to produce best results (e.g. I can do optimized builds successfully with least effort).
If I build it using the exact official build instructions, i.e. creating an optimized/non-debug build, everything works as expected. So running the following seems to succeed.
bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
Same thing, if I run the following, which includes debug info, but does not turn off optimization (e.g. I cannot really use this for debug purposes).
bazel build --config cuda --strip=never -c opt --copt="-ggdb" //tensorflow/tools/pip_package:build_pip_package
But everything which disables optimizations does not seem to work. If I run the following (with or without the --strip=never flag )
bazel build --config cuda --strip=never -c dbg
//tensorflow/tools/pip_package:build_pip_package
I arrive at the following error:
INFO: From Compiling
tensorflow/contrib/framework/kernels/zero_initializer_op_gpu.cu.cc:
external/com_google_absl/absl/strings/string_view.h(496): error:
constexpr function return is non-constant
Which can be resolved by defining -DNDEBUG (see nvcc error: string_view.h: constexpr function return is non-constant ).
But If I run the following:
bazel build --config cuda --strip=never -c dbg --copt="-DNDEBUG" //tensorflow/tools/pip_package:build_pip_package
I get these linking errors at the final step of the build:
ERROR:
/tensorflow/python/BUILD:3865:1:
Linking of rule '//tensorflow/python:_pywrap_tensorflow_internal.so'
failed (Exit 1)
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o: In
function_init': (.init+0x7): relocation truncated to fit:
gmon_start'
R_X86_64_REX_GOTPCRELX against undefined symbol
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function
deregister_tm_clones': crtstuff.c:(.text+0x3): relocation truncated
.tm_clone_table'
to fit: R_X86_64_PC32 against
crtstuff.c:(.text+0xa): relocation truncated to fit: R_X86_64_PC32
against symbol__TMC_END__' defined in .nvFatBinSegment section in
_ITM_deregisterTMCloneTable'
bazel-out/k8-dbg/bin/tensorflow/python/_pywrap_tensorflow_internal.so
crtstuff.c:(.text+0x1e): relocation truncated to fit:
R_X86_64_REX_GOTPCRELX against undefined symbol
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function
register_tm_clones': crtstuff.c:(.text+0x43): relocation truncated to
.tm_clone_table' crtstuff.c:(.text+0x4a):
fit: R_X86_64_PC32 against
relocation truncated to fit: R_X86_64_PC32 against symbol
__TMC_END__' defined in .nvFatBinSegment section in
_ITM_registerTMCloneTable'
bazel-out/k8-dbg/bin/tensorflow/python/_pywrap_tensorflow_internal.so
crtstuff.c:(.text+0x6b): relocation truncated to fit:
R_X86_64_REX_GOTPCRELX against undefined symbol
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function
__do_global_dtors_aux': crtstuff.c:(.text+0x92): relocation truncated
.bss' crtstuff.c:(.text+0x9c):
to fit: R_X86_64_PC32 against
relocation truncated to fit: R_X86_64_GOTPCREL against symbol
__cxa_finalize@@GLIBC_2.2.5' defined in .text section in
__dso_handle' defined
/lib/x86_64-linux-gnu/libc.so.6 crtstuff.c:(.text+0xaa): relocation
truncated to fit: R_X86_64_PC32 against symbol
in .data.rel.local section in
/usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o crtstuff.c:(.text+0xbb):
additional relocation overflows omitted from the output
bazel-out/k8-dbg/bin/tensorflow/python/_pywrap_tensorflow_internal.so:
PC-relative offset overflow in GOT PLT entry for
`_ZNK5Eigen10TensorBaseINS_9TensorMapINS_6TensorIKjLi1ELi1EiEELi16ENS_11MakePointerEEELi0EE9unaryExprINS_8internal11scalar_leftIjjN10tensorflow7functor14right_shift_opIjEEEEEEKNS_18TensorCwiseUnaryOpIT_KS6_EERKSH_'
collect2: error: ld returned 1 exit status Target
//tensorflow/tools/pip_package:build_pip_package failed to build
I hoped to be able to solve that by doing a monolithic build. So I tried that, and got essentially the same error.
bazel build --config cuda -c dbg --config=monolithic --copt="-DNDEBUG" //tensorflow/tools/pip_package:build_pip_package
I also tried the approaches from TensorFlow doesnt build with debug mode and several other variants I found by extensive googling. I'm running out of options.
I'd take any Tensorflow version from 1.11 onwards, including (working) nightly builds. It just needs to work with CUDA 9 on x86 linux, include debug symbols and disabled optimizations.
thank you very much in Advance..
c++ tensorflow build bazel debug-symbols
c++ tensorflow build bazel debug-symbols
asked Nov 10 at 10:26
Kai Londenberg
312
312
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
Just in case someone else stumbles over this problem. I finally got it to compile, using the following command:
bazel build --config cuda --strip=never --copt="-DNDEBUG" --copt="-march=native" --copt="-Og" --copt="-g3" --copt="-mcmodel=medium" --copt="-fPIC" //tensorflow/tools/pip_package:build_pip_package
After that, installation is a bit of a hazzle, since the wheel cannot be built anymore. But the tensorflow build can be installed anyway:
When building the wheel, via
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
The process fails with an error which seems to be a problem with python's builtin zip compression library (i.e. it cannot compress the resulting archive, since it's too large).
It's important to run it anyway, since it only fails at the final step (archiving). When running build_pip_package, it prints to the console right at the start of the process, that it's building the package in a temporary directory (say, /tmp/Shjwejweu ) - the contents of that temp directory can be used to install tf debug version. Simply copy it to the target machine, then make sure you have any old tensorflow package removed (e.g. pip uninstall tensorflow), and run within:
python setup.py install
But be careful to actively uninstall the "tensorflow" package first, otherwise you can get two simultaneously installed tensorflow versions..
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Just in case someone else stumbles over this problem. I finally got it to compile, using the following command:
bazel build --config cuda --strip=never --copt="-DNDEBUG" --copt="-march=native" --copt="-Og" --copt="-g3" --copt="-mcmodel=medium" --copt="-fPIC" //tensorflow/tools/pip_package:build_pip_package
After that, installation is a bit of a hazzle, since the wheel cannot be built anymore. But the tensorflow build can be installed anyway:
When building the wheel, via
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
The process fails with an error which seems to be a problem with python's builtin zip compression library (i.e. it cannot compress the resulting archive, since it's too large).
It's important to run it anyway, since it only fails at the final step (archiving). When running build_pip_package, it prints to the console right at the start of the process, that it's building the package in a temporary directory (say, /tmp/Shjwejweu ) - the contents of that temp directory can be used to install tf debug version. Simply copy it to the target machine, then make sure you have any old tensorflow package removed (e.g. pip uninstall tensorflow), and run within:
python setup.py install
But be careful to actively uninstall the "tensorflow" package first, otherwise you can get two simultaneously installed tensorflow versions..
add a comment |
up vote
0
down vote
Just in case someone else stumbles over this problem. I finally got it to compile, using the following command:
bazel build --config cuda --strip=never --copt="-DNDEBUG" --copt="-march=native" --copt="-Og" --copt="-g3" --copt="-mcmodel=medium" --copt="-fPIC" //tensorflow/tools/pip_package:build_pip_package
After that, installation is a bit of a hazzle, since the wheel cannot be built anymore. But the tensorflow build can be installed anyway:
When building the wheel, via
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
The process fails with an error which seems to be a problem with python's builtin zip compression library (i.e. it cannot compress the resulting archive, since it's too large).
It's important to run it anyway, since it only fails at the final step (archiving). When running build_pip_package, it prints to the console right at the start of the process, that it's building the package in a temporary directory (say, /tmp/Shjwejweu ) - the contents of that temp directory can be used to install tf debug version. Simply copy it to the target machine, then make sure you have any old tensorflow package removed (e.g. pip uninstall tensorflow), and run within:
python setup.py install
But be careful to actively uninstall the "tensorflow" package first, otherwise you can get two simultaneously installed tensorflow versions..
add a comment |
up vote
0
down vote
up vote
0
down vote
Just in case someone else stumbles over this problem. I finally got it to compile, using the following command:
bazel build --config cuda --strip=never --copt="-DNDEBUG" --copt="-march=native" --copt="-Og" --copt="-g3" --copt="-mcmodel=medium" --copt="-fPIC" //tensorflow/tools/pip_package:build_pip_package
After that, installation is a bit of a hazzle, since the wheel cannot be built anymore. But the tensorflow build can be installed anyway:
When building the wheel, via
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
The process fails with an error which seems to be a problem with python's builtin zip compression library (i.e. it cannot compress the resulting archive, since it's too large).
It's important to run it anyway, since it only fails at the final step (archiving). When running build_pip_package, it prints to the console right at the start of the process, that it's building the package in a temporary directory (say, /tmp/Shjwejweu ) - the contents of that temp directory can be used to install tf debug version. Simply copy it to the target machine, then make sure you have any old tensorflow package removed (e.g. pip uninstall tensorflow), and run within:
python setup.py install
But be careful to actively uninstall the "tensorflow" package first, otherwise you can get two simultaneously installed tensorflow versions..
Just in case someone else stumbles over this problem. I finally got it to compile, using the following command:
bazel build --config cuda --strip=never --copt="-DNDEBUG" --copt="-march=native" --copt="-Og" --copt="-g3" --copt="-mcmodel=medium" --copt="-fPIC" //tensorflow/tools/pip_package:build_pip_package
After that, installation is a bit of a hazzle, since the wheel cannot be built anymore. But the tensorflow build can be installed anyway:
When building the wheel, via
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
The process fails with an error which seems to be a problem with python's builtin zip compression library (i.e. it cannot compress the resulting archive, since it's too large).
It's important to run it anyway, since it only fails at the final step (archiving). When running build_pip_package, it prints to the console right at the start of the process, that it's building the package in a temporary directory (say, /tmp/Shjwejweu ) - the contents of that temp directory can be used to install tf debug version. Simply copy it to the target machine, then make sure you have any old tensorflow package removed (e.g. pip uninstall tensorflow), and run within:
python setup.py install
But be careful to actively uninstall the "tensorflow" package first, otherwise you can get two simultaneously installed tensorflow versions..
edited 2 days ago
answered Nov 10 at 13:26
Kai Londenberg
312
312
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53238019%2fhow-do-i-create-a-debug-build-of-a-recent-tensorflow-version-with-cuda-support%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password