System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): 4.18.10-1rodete2-amd64 (Debian-derived)
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
- TensorFlow installed from (source or binary): binary
- TensorFlow version (use command below): nightly Jan 15, 2018 (protobuf built from HEAD Jan 15)
- Python version: N/A
- Bazel version (if compiling from source): N/A
- GCC/Compiler version (if compiling from source): gcc 7.3.0
- CUDA/cuDNN version: N/A
- GPU model and memory: N/A
Describe the current behavior Aborts on SIGSEGV Describe the expected behavior Exits cleanly Details I want to create an application that calls the C API but also can parse protocol buffers on its own behalf. For that want to link dynamically to tensorflow and statically to protobuf. When I do this, it seems like protobuf may be tricking libtensorflow.so into thinking that it has run some static initializers that it in fact has not run (on the static variables needed by its own internal copy of protobuf). The segfault is only on Linux. Linking the same way on Windows works fine. I have varied libtensorflow and protobuf versions, and it seems to happen with all of them. It also happens whether I choose static or dynamic linking for my binary's copy of protobuf. I also tried building my own liba.so that itself statically links protobuf and then a binary that linked dynamically to "a" and statically to protobuf. This worked, which is pointing away from this being a purely protobuf issue. Code to reproduce the issue
c++ -o main \
-L$TF_DIR/lib -I$TF_DIR/include \
-L$PROTO_DIR/lib -I$PROTO_DIR/include \
main.cc -l tensorflow -l protobuf
LD_LIBRARY_PATH=$TF_DIR/lib:$PROTO_DIR/lib ./main
Removing -lprotobuf from the above command will get rid of the segfault.
int main(int argc, char** argv) {}
Other info / logs Program received signal SIGSEGV, Segmentation fault. 0x00007fffed8f20b8 in tensorflow::kernel_factory::OpKernelRegistrar::InitInternal(tensorflow::KernelDef const*, absl::string_view, std::un ique_ptr<tensorflow::kernel_factory::OpKernelFactory, std::default_deletetensorflow::kernel_factory::OpKernelFactory >) () from /usr/local/google/home/mattharvey/no_backup/libtensorflow/lib/libtensorflow_framework.so (gdb) bt #0 0x00007fffed8f20b8 in tensorflow::kernel_factory::OpKernelRegistrar::InitInternal(tensorflow::KernelDef const*, absl::string_view, std ::unique_ptr<tensorflow::kernel_factory::OpKernelFactory, std::default_deletetensorflow::kernel_factory::OpKernelFactory >) () from /usr/local/google/home/mattharvey/no_backup/libtensorflow/lib/libtensorflow_framework.so #1 0x00007fffed88336a in tensorflow::kernel_factory::OpKernelRegistrar::OpKernelRegistrar(tensorflow::KernelDef const*, absl::string_view , tensorflow::OpKernel* ()(tensorflow::OpKernelConstruction)) () from /usr/local/google/home/mattharvey/no_backup/libtensorflow/lib/libtensorflow_framework.so #2 0x00007fffed85f806 in _GLOBAL__sub_I_dataset.cc () from /usr/local/google/home/mattharvey/no_backup/libtensorflow/lib/libtensorflow_framework.so #3 0x00007ffff7de88aa in call_init (l=, argc=argc@entry=1, argv=argv@entry=0x7fffffffdc68, env=env@entry=0x7fffffffdc78) at dl-init.c:72 #4 0x00007ffff7de89bb in call_init (env=0x7fffffffdc78, argv=0x7fffffffdc68, argc=1, l=) at dl-init.c:30 #5 _dl_init (main_map=0x7ffff7ffe170, argc=1, argv=0x7fffffffdc68, env=0x7fffffffdc78) at dl-init.c:120 #6 0x00007ffff7dd9c5a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2 #7 0x0000000000000001 in ?? () #8 0x00007fffffffdf2e in ?? () #9 0x0000000000000000 in ?? () 0x00007fffed8f20a0 <+80>: mov 0x50(%r15),%rax 0x00007fffed8f20a4 <+84>: lea -0xa0(%rbp),%rbx 0x00007fffed8f20ab <+91>: mov %rbx,%rdi 0x00007fffed8f20ae <+94>: mov (%rax),%r8 0x00007fffed8f20b1 <+97>: mov 0x48(%r15),%rax 0x00007fffed8f20b5 <+101>: mov (%rax),%rsi => 0x00007fffed8f20b8 <+104>: mov -0x18(%r8),%r9 How did -0x18(%r8) get illegal? (gdb) info register r8 r8 0x0 0 -0x18 is certainly illegal. Where did it come from? 0x50(%r15) if we trace through the above. (gdb) info register r15 r15 0x555555768d10 93824994413840 (gdb) x/2 0x555555768d60 0x555555768d60: 0xee2c0bc0 0x00007fff (gdb) x/2 0x00007fffee2c0bc0 0x7fffee2c0bc0 google::protobuf::internal::fixed_address_empty_string: 0x00000000 0x00000000 ... the 0x0 that ended up in r8. Zoom out to find lots of stuff uninitialized: (gdb) x/64x 0x7fffee4ddb00 0x7fffee4ddb00 google::protobuf::_DoubleValue_default_instance_: 0x00000000 0x00000000 0x00000000 0x00000000 0x7fffee4ddb10 google::protobuf::_DoubleValue_default_instance_+16: 0x00000000 0x00000000 0x00000000 0x00000000 0x7fffee4ddb20 <_ZStL8__ioinit>: 0x00000000 0x00000000 0x00000000 0x00000000 0x7fffee4ddb30 <_ZStL8__ioinit>: 0x00000000 0x00000000 0x00000000 0x00000000 0x7fffee4ddb40 google::protobuf::internal::RepeatedPrimitiveDefaults::default_instance()::instance: 0x00000000 0x00000000 0x00000000 0x00000000 0x7fffee4ddb50 <guard variable for google::protobuf::internal::RepeatedStringTypeTraits::GetDefaultRepeatedField()::instance>: 0x000000000x00000000 0x00000000 0x00000000 0x7fffee4ddb60 <guard variable for google::protobuf::internal::(anonymous namespace)::Register(google::protobuf::MessageLite const*, int, google::protobuf::internal::ExtensionInfo)::local_static_registry>: 0x00000000 0x00000000 0x00000000 0x00000000 0x7fffee4ddb70 <_ZStL8__ioinit>: 0x00000000 0x00000000 0x00000000 0x00000000 0x7fffee4ddb80 google::protobuf::internal::InitSCCImpl(google::protobuf::internal::SCCInfoBase*)::mu: 0x00000000 0x00000000 0x00000000 0x00000000 0x7fffee4ddb90 google::protobuf::internal::InitSCCImpl(google::protobuf::internal::SCCInfoBase*)::mu+16: 0x00000000 0x000000000x00000000 0x00000000 0x7fffee4ddba0 google::protobuf::internal::InitSCCImpl(google::protobuf::internal::SCCInfoBase*)::mu+32: 0x00000000 0x000000000x00000000 0x00000000 0x7fffee4ddbb0 <guard variable for google::protobuf::internal::InitSCCImpl(google::protobuf::internal::SCCInfoBase*)::runner>: 0x000000000x00000000 0x00000000 0x00000000 0x7fffee4ddbc0 google::protobuf::internal::fixed_address_empty_string: 0x00000000 0x00000000 0x00000000 0x00000000 0x7fffee4ddbd0 google::protobuf::internal::implicit_weak_message_default_instance: 0x00000000 0x00000000 0x00000000 0x00000000 0x7fffee4ddbe0 google::protobuf::internal::implicit_weak_message_default_instance+16: 0x00000000 0x00000000 0x00000000 0x00000000 0x7fffee4ddbf0 google::protobuf::ShutdownProtobufLibrary()::is_shutdown: 0x00000000 0x00000000 0x00000000 0x00000000 |