php firstorfail,PHP-FPM:failed to ptrace(PEEKDATA)

What's causing the PEEKDATA error?

I already mentioned it's a result of a flaw in how PHP handles

requests. This is explained in more

detail here . The gist of it

is:

...

the worker is free to go when the master is determining slow

execution. When stopping to be traced,

it may have

completed that execution and is in any stage serving another

request, so the tracer gets the chance of failure or worse, dumping

out the stack of an irrelevant execution.

In other words: when PHP's slowlog tries to get an actual

stacktrace, it may send a signal to the master process which in

turn finds the child process that is slow (which

it fork() 'd).

However, by the time that happens, the child may have already

finished its request and started serving another

one.

Or it's not even handling a request at all, because it's finished

and waiting for a new incoming request.

This leads to A) a stacktrace of the wrong process or B) the error

you see above, a PEEKDATA error because there was no data

to peek into.

So take that into account when reviewing PHP FPM slowlogs, too.

It appears you have request_slowlog_timeout enabled.

This normally takes any request longer than N seconds, logs that it

was taking a long time, then logs a stack trace of the script so

you can see what it was doing that was taking so long.

In your case, the stack trace (to determine what the script is

doing) is failing. If you're running out of processes, it is

because either:

After php-fpm stops the process to trace it, the process fails to

resume because of the error tracing it

The process is resuming but continues to run forever.

My first guess would be to disable request_slowlog_timeout. Since

it's not working right, it may be doing more harm than good. If

this doesn't fix the issue of running out of processes, then set

the php.ini max_execution_time to

something that will kill the script for sure.

参考:

http://serverfault.com/questions/406532/i-o-error-with-php5-fpm-ptracepeekdata-failed

http://www.tuicool.com/articles/ZFBz2i

http://blog.csdn.net/hmsiwtv/article/details/11022241

http://kaiwangchen.github.io/2012/10/04/fpm-slowlog-sucks.html

A  google paper such as:

fpm-slowlog-sucks

FPM slowlog sucks

Oct 4, 2012 • PHP, FPM, slowlog

Shipped with

official PHP distribution, FPM (FastCGI

Process Manager) is an alternative PHP FastCGI implementation with

some additional features (mostly) useful for heavy-loaded sites,

one of which

is slowlog - logging

scripts (not just their names, but their PHP backtraces too, using

ptrace and similar things to read remote process’ execute_data)

that are executed unusually slow.

However, does FPM

slowlog always represent those scripts responsible to bad

performance? Sadly that’s a goal theoretcally impossible because of

the way it works. Several months ago, I was badly misled by FPM

slowlog entries, and actually the slowlog was almost useless with

now and then corrupted dumps of script stack

like this

[18-Aug-2012 17:39:43] [pool www] pid 3202

script_filename = /path/to/script.php

[0x00000000013f1840] +++ dump failed

which corresponds to

error log entries (entries are split into multiple lines to get

better display, thus we have two entries in this

case):

[18-Aug-2012 17:39:43] WARNING: [pool www] child 3202,

script '/path/to/script.php' (request: "GET /script.php")

executing too slow (4.511165 sec), logging

[18-Aug-2012 17:39:43] ERROR: failed to ptrace(PEEKDATA)

pid 3202: Input/output error (5)

The problem

We got tons

of “upstream timed out (110: Connection timed

out)” in the local Nginx error log with

following setup

Remote Nginx -- (cross IDC)

\

(localnet) \

Local Nginx -------------- FPM

Apparently the FPM

process pool is exausted, so Nginx has to wait until any FPM worker

is available. FPM also has many entries in its error log and

slowlog, leading a vast investication into all reported scripts,

and the inevitable sad conclusion is possibly wrong

direction.

Notice the problem

occurs only several times a week and we have cross-IDC deployment.

I guess it might be some network issue.

Theory of FPM slowlogging

FPM slowlogging

basically consists of these parts:

Timing of script execution

Expiration check

Dumping of the PHP stack of the script

You probably have

heard about FPM manages pools of

worker processes with

a master process. When

FPM server is started, it is only the master process, which does

initialization like parsing php-fpm.conf and php.ini, loading PHP

extension modules, and most notably open server sockets, one per

pool, to be inherited by child processes, the workers. Then the

master process forks

enough pools of workers, and enters into

the master event loop, which besides

other housekeeping tasks, periodly checks whether workers complete

in time, and which dumps the stacktrace in the case of

expiration.

Each pool has

a scoreboard structure

holding the state of the pool itself and all workers belonging to

it. The scoreboard is

allocated as shared memory by mmap(2) with MAP_SHARED flag,

and shared by all workers because they are

always fork‘ed

from the master. Although the structure could be accessed by any of

the workers, each of them usually updates its own share, and does

not visit others’, except for certain cases such as serving

the /status special

request when it peeks all scoreboards

to generate overall status report.

The children of

course bypass the master event loop, and then enter their

independent worker cycles, serving

FastCGI requests one after another. A worker cycle basically

consists of following steps

1. fpm_request_accepting // FPM_REQUEST_ACCEPTING

2. accept

3. fpm_request_reading_headers // FPM_REQUEST_READING_HEADERS

4. fcgi_read_request

5. init_request_info

6. fpm_request_info // FPM_REQUEST_INFO

7. php_request_startup

8. fpm_request_executing // FPM_REQUEST_EXECUTING

9. php_execute_script

10. fpm_request_end // FPM_REQUEST_FINISHED

11. php_request_shutdown

The functions with

prefix fpm_request_ are

to update the scoreboard to

reflect current processing stage and current time,

while fpm_request_reading_headers additionally

put down the accepted time. Other functions do heavy jobs and,

since we focus on timing here, are not

discussed.

The master process

fires up to check script execution time periodly, and the interval

is calculated as MIN(request_terminate_timeout,

request_slowlog_timeout) * 1000 / 3.

Every time the

master searches for workers in FPM_REQUEST_EXECUTING stage,

takes the difference between the fireup

time and the

recorded accepted

time of the worker, and compares to the owner

pool’s request_slowlog_timeout to

determine whether it is executing a slow script. If it is, then the

master ptrace(2)es

to the worker with ATTACH flag,

and goes on searching. The worker stops to be traced, which in turn

signals the master. Then the master

calls fpm_php_trace to

dump the stack of the script to the slowlog file, and then

unattaches from the worker to let it

continue.

Since PHP

interpreter’s executor_globals structure

is intialized by the master before forking,

it is inherited by all workers. With known value

of executor_globals.current_execute_data,

the master is able to peek the stack of current script executed by

the stopped worker. Slowlog is probably a unique feature that those

general process managers are lacking by

theory.

Native defect of FPM slowlogging

The scoreboards

are protected by locks when above functions with

prefix fpm_request_ are

called, to provide exclusive access at one time. The slow execution

checker copies with lock protection a worker’s share

of scoreboard,

however, it releases the

lock immediately, instead of having

received worker’s

acknowledge of ptrace ATTACH.

It simply is impossible to implement such strict synchronization

with current architecture as of PHP 5.3.10, although the tracer can

do some sanity check before dumping stack. As a result, the worker

is free to go when the master is determining slow execution. When

stopping to be traced,

it may have completed

that execution and is in any stage serving another request, so the

tracer gets the chance of failure or worse, dumping out the stack

of an irrelevant execution.

Another defect is

FPM slowlog by definition should consider the completion of

receving FastCGI headers, instead of reading headres,

as start of script execution, because

reading FastCGI headers is a matter only after which can the worker

choose a script to execute. It is not the script’s fault to be

caught in FastCGI parsing for a while.

The problem, revisited

Believe it or not,

when the overhelming timeout occured, I found some connections,

returned by accept(2),

got their first byte read after incredibly several seconds, while

the scripts themselves executed very fast. They were considered

slow because the long delay was added to slowlog timing, and since

they finished fast, tracer got good chance to dump next execution.

The problem was spotted by strace‘ing

all FPM processes and analyzing, with accept_read.pl,

the timing differences between accept(2) and

related first read(2).

I guess it was the network stack’s fault.

I thought it was

slow connections that exausted the worker pool, so a quick solution

was switching to the following setup

Remote Nginx

|(http cross IDC)

| (localnet)

Local Nginx -------------- FPM

That way, Nginx

takes care of the delay, and FPM worker pool is protected. Keep in

mind that in the quick setup, scripts may get false client

addresses if designed without knowledge of HTTP proxy headers.

Anyway, worked for me.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值