admin管理员组文章数量:1026298
I am training a neural network model and when running the code :
history = model.fit(
X_train, y_train_encoded,
epochs=20,
batch_size=32,
validation_data=(X_val, y_val_encoded),
)
The first epoch finishes successfully (in around 11 minutes), and then I get this error :
InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.
Contrary to the majority of people getting this error before the training starts. For me, The training works for the first epoch, finishes it successfully (it takes around 11 minutes) and then crashes before getting to the next epoch. It seems that tensorflow is not managing the GPU memory well because when inspected I see that 3.1GB/4GB of the GPU are being used (even after the code crashes), and won't start training again when I run it from the jupyter notebook, always gives the same error instantly. I need to restart the GPU to have it back to empty and only then it will run for the first epoch and then get the same error.
Did anyone encounter this problem ? And is it a bug related to some things that tensorflow does between epochs (like clearing memory) ?
I am using tensorflow 2.10.1 on windows with a GTX 1050ti GPU (ps : I can't afford better hardware, running on CPU takes around 5 times more time)
Thank you for any help.
I am training a neural network model and when running the code :
history = model.fit(
X_train, y_train_encoded,
epochs=20,
batch_size=32,
validation_data=(X_val, y_val_encoded),
)
The first epoch finishes successfully (in around 11 minutes), and then I get this error :
InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.
Contrary to the majority of people getting this error before the training starts. For me, The training works for the first epoch, finishes it successfully (it takes around 11 minutes) and then crashes before getting to the next epoch. It seems that tensorflow is not managing the GPU memory well because when inspected I see that 3.1GB/4GB of the GPU are being used (even after the code crashes), and won't start training again when I run it from the jupyter notebook, always gives the same error instantly. I need to restart the GPU to have it back to empty and only then it will run for the first epoch and then get the same error.
Did anyone encounter this problem ? And is it a bug related to some things that tensorflow does between epochs (like clearing memory) ?
I am using tensorflow 2.10.1 on windows with a GTX 1050ti GPU (ps : I can't afford better hardware, running on CPU takes around 5 times more time)
Thank you for any help.
本文标签: pythonTensorflow GPU crashing after first epochStack Overflow
版权声明:本文标题:python - Tensorflow GPU crashing after first epoch - Stack Overflow 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://it.en369.cn/questions/1745625621a2159841.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论