bianbu-linux-6.6

mirror of https://gitee.com/bianbu-linux/linux-6.6 synced 2025-07-22 01:43:37 -04:00

Author	SHA1	Message	Date
Heiko Carstens	6d5b5acca9	Fix fixpoint divide exception in acct_update_integrals Frans Pop reported the crash below when running an s390 kernel under Hercules: Kernel BUG at 000738b4 verbose debug info unavailable! fixpoint divide exception: 0009 #1! SMP Modules linked in: nfs lockd nfs_acl sunrpc ctcm fsm tape_34xx cu3088 tape ccwgroup tape_class ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod dasd_eckd_mod dasd_mod CPU: 0 Not tainted 2.6.27.19 #13 Process awk (pid: 2069, task: 0f9ed9b8, ksp: 0f4f7d18) Krnl PSW : 070c1000 800738b4 (acct_update_integrals+0x4c/0x118) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 Krnl GPRS: 00000000 000007d0 7fffffff fffff830 00000000 ffffffff 00000002 0f9ed9b8 00000000 00008ca0 00000000 0f9ed9b8 0f9edda4 8007386e 0f4f7ec8 0f4f7e98 Krnl Code: 800738aa: a71807d0 lhi %r1,2000 800738ae: 8c200001 srdl %r2,1 800738b2: 1d21 dr %r2,%r1 >800738b4: 5810d10e l %r1,270(%r13) 800738b8: 1823 lr %r2,%r3 800738ba: 4130f060 la %r3,96(%r15) 800738be: 0de1 basr %r14,%r1 800738c0: 5800f060 l %r0,96(%r15) Call Trace: ( <000000000004fdea>! blocking_notifier_call_chain+0x1e/0x2c) <0000000000038502>! do_exit+0x106/0x7c0 <0000000000038c36>! do_group_exit+0x7a/0xb4 <0000000000038c8e>! SyS_exit_group+0x1e/0x30 <0000000000021c28>! sysc_do_restart+0x12/0x16 <0000000077e7e924>! 0x77e7e924 Reason for this is that cpu time accounting usually only happens from interrupt context, but acct_update_integrals gets also called from process context with interrupts enabled. So in acct_update_integrals we may end up with the following scenario: Between reading tsk->stime/tsk->utime and tsk->acct_timexpd an interrupt happens which updates accouting values. This causes acct_timexpd to be greater than the former stime + utime. The subsequent calculation of dtime = cputime_sub(time, tsk->acct_timexpd); will be negative and the division performed by cputime_to_jiffies(dtime) will generate an exception since the result won't fit into a 32 bit register. In order to fix this just always disable interrupts while accessing any of the accounting values. Reported by: Frans Pop <elendil@planet.nl> Tested by: Frans Pop <elendil@planet.nl> Cc: stable@kernel.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-03-09 08:13:35 -07:00
KOSAKI Motohiro	c3ffc7a40b	tracing: Don't use tracing_record_cmdline() in workqueue tracer Impact: improve workqueue tracer output Currently, /sys/kernel/debug/tracing/trace_stat/workqueues can display wrong and strange thread names. Why? Currently, ftrace has tracing_record_cmdline()/trace_find_cmdline() convenience function that implements a task->comm string cache. This can avoid unnecessary memcpy overhead and the workqueue tracer uses it. However, in general, any trace statistics feature shouldn't use tracing_record_cmdline() because trace statistics can display very old process. Then comm cache can return wrong string because recent process overrides the cache. Fortunately, workqueue trace guarantees that displayed processes are live. Thus we can search comm string from PID at display time. <before> % cat workqueues # CPU INSERTED EXECUTED NAME # \| \| \| \| 7 431913 431913 kondemand/7 7 0 0 tail 7 21 21 git 7 0 0 ls 7 9 9 cat 7 832632 832632 unix_chkpwd 7 236292 236292 ls Note: tail, git, ls, cat unix_chkpwd are obiously not workqueue thread. <after> % cat workqueues # CPU INSERTED EXECUTED NAME # \| \| \| \| 7 510 510 kondemand/7 7 0 0 kmpathd/7 7 15 15 ata/7 7 0 0 aio/7 7 11 11 kblockd/7 7 1063 1063 work_on_cpu/7 7 167 167 events/7 Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-09 10:26:13 +01:00
KOSAKI Motohiro	888b55dc31	ftrace: tracing header should put '#' at the beginning of a line In a recent discussion, Andrew Morton pointed out that tracing header should put '#' at the beginning of a line. Then, we can easily filtered the header by following grep usage: cat trace \| grep -v '^#' Wakeup trace also has the same header problem. Comparison of headers displayed: before this patch: # tracer: wakeup # wakeup latency trace v1.1.5 on 2.6.29-rc7-tip-tip -------------------------------------------------------------------- latency: 19059 us, #21277/21277, CPU#1 \| (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4) ----------------- \| task: kondemand/1-1644 (uid:0 nice:-5 policy:0 rt_prio:0) ----------------- # _------=> CPU# # / _-----=> irqs-off # \| / _----=> need-resched # \|\| / _---=> hardirq/softirq # \|\|\| / _--=> preempt-depth # \|\|\|\| / # \|\|\|\|\| delay # cmd pid \|\|\|\|\| time \| caller # \ / \|\|\|\|\| \ \| / irqbalan-1887 1d.s. 0us : 1887:120:R + [001] 1644:115:S kondemand/1 irqbalan-1887 1d.s. 1us : default_wake_function <-autoremove_wake_function irqbalan-1887 1d.s. 2us : check_preempt_wakeup <-try_to_wake_up after this patch: # tracer: wakeup # # wakeup latency trace v1.1.5 on 2.6.29-rc7-tip-tip # -------------------------------------------------------------------- # latency: 529 us, #530/530, CPU#0 \| (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # \| task: kondemand/0-1641 (uid:0 nice:-5 policy:0 rt_prio:0) # ----------------- # # _------=> CPU# # / _-----=> irqs-off # \| / _----=> need-resched # \|\| / _---=> hardirq/softirq # \|\|\| / _--=> preempt-depth # \|\|\|\| / # \|\|\|\|\| delay # cmd pid \|\|\|\|\| time \| caller # \ / \|\|\|\|\| \ \| / sshd-2496 0d.s. 0us : 2496:120:R + [000] 1641:115:S kondemand/0 sshd-2496 0d.s. 1us : default_wake_function <-autoremove_wake_function sshd-2496 0d.s. 1us : check_preempt_wakeup <-try_to_wake_up Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <20090308124421.23C3.A69D9226@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-08 16:57:22 +01:00
Ingo Molnar	dba58e39ce	Merge branches 'tracing/doc', 'tracing/ftrace', 'tracing/printk' and 'tracing/textedit' into tracing/core	2009-03-08 16:48:51 +01:00
Ingo Molnar	9de36825b3	tracing: trace_bprintk() cleanups Impact: cleanup Remove a few leftovers and clean up the code a bit. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 17:59:12 +01:00
Frederic Weisbecker	769b0441f4	tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk() Impact: faster and lighter tracing Now that we have trace_bprintk() which is faster and consume lesser memory than trace_printk() and has the same purpose, we can now drop the old implementation in favour of the binary one from trace_bprintk(), which means we move all the implementation of trace_bprintk() to trace_printk(), so the Api doesn't change except that we must now use trace_seq_bprintk() to print the TRACE_PRINT entries. Some changes result of this: - Previously, trace_bprintk depended of a single tracer and couldn't work without. This tracer has been dropped and the whole implementation of trace_printk() (like the module formats management) is now integrated in the tracing core (comes with CONFIG_TRACING), though we keep the file trace_printk (previously trace_bprintk.c) where we can find the module management. Thus we don't overflow trace.c - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries. - change a bit trace_printk/trace_vprintk macros to support non-builtin formats constants, and fix 'const' qualifiers warnings. But this is all transparent for developers. - etc... V2: - Rebase against last changes - Fix mispell on the changelog V3: - Rebase against last changes (moving trace_printk() to kernel.h) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 17:59:12 +01:00
Lai Jiangshan	1ba28e02a1	tracing: add trace_bprintk() Impact: add a generic printk() for tracing, like trace_printk() trace_bprintk() uses the infrastructure to record events on ring_buffer. [ fweisbec@gmail.com: ported to latest -tip, made it work if !CONFIG_MODULES, never free the format strings from modules because we can't keep track of them and conditionnaly create the ftrace format strings section (reported by Steven Rostedt) ] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-4-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 17:59:11 +01:00
Lai Jiangshan	1427cdf059	tracing: infrastructure for supporting binary record Impact: save on memory for tracing Current tracers are typically using a struct(like struct ftrace_entry, struct ctx_switch_entry, struct special_entr etc...)to record a binary event. These structs can only record a their own kind of events. A new kind of tracer need a new struct and a lot of code too handle it. So we need a generic binary record for events. This infrastructure is for this purpose. [fweisbec@gmail.com: rebase against latest -tip, make it safe while sched tracing as reported by Steven Rostedt] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-3-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 17:59:11 +01:00
Mathieu Desnoyers	4460fdad85	tracing, Text Edit Lock - kprobes architecture independent support Use the mutual exclusion provided by the text edit lock in the kprobes code. It allows coherent manipulation of the kernel code by other subsystems. Changelog: Move the kernel_text_lock/unlock out of the for loops. Use text_mutex directly instead of a function. Remove whitespace modifications. (note : kprobes_mutex is always taken outside of text_mutex) Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> LKML-Reference: <49B14306.2080202@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 16:49:00 +01:00
Ingo Molnar	f0ef039851	Merge branch 'x86/core' into tracing/textedit Conflicts: arch/x86/Kconfig block/blktrace.c kernel/irq/handle.c Semantic conflict: kernel/trace/blktrace.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 16:45:01 +01:00
Lai Jiangshan	5ed0cec0ac	sched: TIF_NEED_RESCHED -> need_reshed() cleanup Impact: cleanup Use test_tsk_need_resched(), set_tsk_need_resched(), need_resched() instead of using TIF_NEED_RESCHED. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <49B10BA4.9070209@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 12:48:55 +01:00
KOSAKI Motohiro	10dd3ebe21	tracing: fix deadlock when setting set_ftrace_pid Impact: fix deadlock while using set_ftrace_pid Reproducer: # cd /sys/kernel/debug/tracing # echo $$ > set_ftrace_pid then, console becomes hung. Details: when writing set_ftracepid, kernel callstack is following ftrace_pid_write() mutex_lock(&ftrace_lock); ftrace_update_pid_func() mutex_lock(&ftrace_lock); mutex_unlock(&ftrace_lock); mutex_unlock(&ftrace_lock); then, system always deadlocks when ftrace_pid_write() is called. In past days, ftrace_pid_write() used ftrace_start_lock, but commit `e6ea44e9b4` consolidated ftrace_start_lock to ftrace_lock. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <srostedt@redhat.com> LKML-Reference: <20090306151155.0778.A69D9226@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 12:07:38 +01:00
KOSAKI Motohiro	422d3c7a57	tracing: current tip/master can't enable ftrace After commit `40ada30f96`, "make menuconfig" doesn't display "Tracer" item. Following modification restores it.	2009-03-06 11:56:42 +01:00
Ingo Molnar	7fc07d8410	Merge branch 'sched/core' into sched/cleanups	2009-03-06 11:47:52 +01:00
Ingo Molnar	bc722f508a	Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace	2009-03-06 11:40:37 +01:00
Ingo Molnar	1609743970	Merge branches 'tracing/ftrace' and 'tracing/function-graph-tracer' into tracing/core	2009-03-06 11:39:18 +01:00
Tejun Heo	edcb463997	percpu, module: implement reserved allocation and use it for module percpu variables Impact: add reserved allocation functionality and use it for module percpu variables This patch implements reserved allocation from the first chunk. When setting up the first chunk, arch can ask to set aside certain number of bytes right after the core static area which is available only through a separate reserved allocator. This will be used primarily for module static percpu variables on architectures with limited relocation range to ensure that the module perpcu symbols are inside the relocatable range. If reserved area is requested, the first chunk becomes reserved and isn't available for regular allocation. If the first chunk also includes piggy-back dynamic allocation area, a separate chunk mapping the same region is created to serve dynamic allocation. The first one is called static first chunk and the second dynamic first chunk. Although they share the page map, their different area map initializations guarantee they serve disjoint areas according to their purposes. If arch doesn't setup reserved area, reserved allocation is handled like any other allocation. Signed-off-by: Tejun Heo <tj@kernel.org>	2009-03-06 14:33:59 +09:00
Steven Rostedt	770cb24345	tracing: add format files for ftrace default entries Impact: allow user apps to read binary format of basic ftrace entries Currently, only defined raw events export their formats so a binary reader can parse them. There's no reason that the default ftrace entries can't export their formats. This patch adds a subsystem called "ftrace" in the events directory that includes the ftrace entries for basic ftrace recorded items. These only have three files in the events directory: type : printf available_types : printf format : format for the event entry For example: # cat /debug/tracing/events/ftrace/wakeup/format name: wakeup ID: 3 format: field:unsigned char type; offset:0; size:1; field:unsigned char flags; offset:1; size:1; field:unsigned char preempt_count; offset:2; size:1; field:int pid; offset:4; size:4; field:int tgid; offset:8; size:4; field:unsigned int prev_pid; offset:12; size:4; field:unsigned char prev_prio; offset:16; size:1; field:unsigned char prev_state; offset:17; size:1; field:unsigned int next_pid; offset:20; size:4; field:unsigned char next_prio; offset:24; size:1; field:unsigned char next_state; offset:25; size:1; field:unsigned int next_cpu; offset:28; size:4; print fmt: "%u:%u:%u ==+ %u:%u:%u [%03u]" Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-05 21:46:44 -05:00
Steven Rostedt	33b0c229e3	tracing: move print of event format to separate file Impact: clean up Move the macro that creates the event format file to a separate header. This will allow the default ftrace events to use this same macro to create the formats to read those events. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-05 21:46:42 -05:00
Steven Rostedt	5e2336a0d4	tracing: make all file_operations const Impact: cleanup All file_operations structures should be constant. No one is going to change them. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-05 21:46:40 -05:00
Ingo Molnar	40ada30f96	tracing: clean up menu Clean up menu structure, introduce TRACING_SUPPORT switch that signals whether an architecture supports various instrumentation mechanisms. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 21:53:25 +01:00
Ingo Molnar	a1413c89ae	Merge branch 'x86/urgent' into x86/core Conflicts: arch/x86/include/asm/fixmap_64.h Semantic merge: arch/x86/include/asm/fixmap.h Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 21:48:50 +01:00
Frederic Weisbecker	8a0be9ef82	sched: don't rebalance if attached on NULL domain Impact: fix function graph trace hang / drop pointless softirq on UP While debugging a function graph trace hang on an old PII, I saw that it consumed most of its time on the timer interrupt. And the domain rebalancing softirq was the most concerned. The timer interrupt calls trigger_load_balance() which will decide if it is worth to schedule a rebalancing softirq. In case of builtin UP kernel, no problem arises because there is no domain question. In case of builtin SMP kernel running on an SMP box, still no problem, the softirq will be raised each time we reach the next_balance time. In case of builtin SMP kernel running on a UP box (most distros provide default SMP kernels, whatever the box you have), then the CPU is attached to the NULL sched domain. So a kind of unexpected behaviour happen: trigger_load_balance() -> raises the rebalancing softirq later on softirq: run_rebalance_domains() -> rebalance_domains() where the for_each_domain(cpu, sd) is not taken because of the NULL domain we are attached at. Which means rq->next_balance is never updated. So on the next timer tick, we will enter trigger_load_balance() which will always reschedule() the rebalacing softirq: if (time_after_eq(jiffies, rq->next_balance)) raise_softirq(SCHED_SOFTIRQ); So for each tick, we process this pointless softirq. This patch fixes it by checking if we are attached to the null domain before raising the softirq, another possible fix would be to set the maximal possible JIFFIES value to rq->next_balance if we are attached to the NULL domain. v2: build fix on UP Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <49af242d.1c07d00a.32d5.ffffc019@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:04:44 +01:00
Frederic Weisbecker	0012693ad4	tracing/function-graph-tracer: use the more lightweight local clock Impact: decrease hangs risks with the graph tracer on slow systems Since the function graph tracer can spend too much time on timer interrupts, it's better now to use the more lightweight local clock. Anyway, the function graph traces are more reliable on a per cpu trace. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <49af243d.06e9300a.53ad.ffff840c@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 12:14:41 +01:00
Ingo Molnar	49d2d266ad	Merge commit 'v2.6.29-rc7' into sched/core	2009-03-05 11:59:10 +01:00
David Rientjes	03d78913f0	lockdep: remove duplicate CONFIG_DEBUG_LOCKDEP definitions Impact: cleanup The atomic debug modifiers are already defined in kernel/lockdep_internals.h. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <alpine.DEB.2.00.0903050222160.30401@chino.kir.corp.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 11:46:00 +01:00
Ingo Molnar	a140feab42	Merge commit 'v2.6.29-rc7' into core/locking	2009-03-05 11:45:22 +01:00
David S. Miller	508827ff0a	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/tokenring/tmspci.c drivers/net/ucc_geth_mii.c	2009-03-05 02:06:47 -08:00
Ingo Molnar	5e1607a00b	tracing: rename ftrace_printk() => trace_printk() Impact: cleanup Use a more generic name - this also allows the prototype to move to kernel.h and be generally available to kernel developers who want to do some quick tracing. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 10:24:48 +01:00
Steven Rostedt	e9d25fe6ea	tracing: have latency tracers set the latency format The latency tracers (irqsoff, preemptoff, preemptirqsoff, and wakeup) are pretty useless with the default output format. This patch makes them automatically enable the latency format when they are selected. They also record the state of the latency option, and if it was not enabled when selected, they disable it on reset. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-04 22:15:30 -05:00
Steven Rostedt	27d48be844	tracing: consolidate print_lat_fmt and print_trace_fmt Impact: clean up Both print_lat_fmt and print_trace_fmt do pretty much the same thing except for one different function call. This patch consolidates the two functions and adds an if statement to perform the difference. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-04 21:57:29 -05:00
Steven Rostedt	5fd73f8624	tracing: remove extra latency_trace method from trace structure Impact: clean up The trace and latency_trace function pointers are identical for every tracer but the function tracer. The differences in the function tracer are trivial (latency output puts paranthesis around parent). This patch removes the latency_trace pointer and all prints will now just use the trace output function pointer. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-04 21:42:04 -05:00
Steven Rostedt	c032ef64d6	tracing: add latency output format option With the removal of the latency_trace file, we lost the ability to see some of the finer details in a trace. Like the state of interrupts enabled, the preempt count, need resched, and if we are in an interrupt handler, softirq handler or not. This patch simply creates an option to bring back the old format. This also removes the warning about an unused variable that held the latency_trace file operations. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-04 20:34:24 -05:00
Steven Rostedt	e74da5235c	tracing: fix seq read from trace files The buffer used by trace_seq was updated incorrectly. Instead of consuming what was actually read, it consumed the rest of the buffer on reads. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-04 20:31:11 -05:00
Steven Rostedt	2dc5d12b1f	tracing: do not return EFAULT if read copied anything Impact: fix trace read to conform to standards Andrew Morton, Theodore Tso and H. Peter Anvin brought to my attention that a userspace read should not return -EFAULT if it succeeded in copying anything. It should only return -EFAULT if it failed to copy at all. This patch modifies the check of copy_from_user and updates the return code appropriately. I also used H. Peter Anvin's short cut rule to just test ret == count. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-04 19:10:05 -05:00
Steven Rostedt	4f3640f8a3	ring-buffer: fix timestamp in partial ring_buffer_page_read If a partial ring_buffer_page_read happens, then some of the incremental timestamps may be lost. This patch writes the recent timestamp into the page that is passed back to the caller. A partial ring_buffer_page_read is where the full page would not be written back to the user, and instead, just part of the page is copied to the user. A full page would be a page swap with the ring buffer and the timestamps would be correct. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-04 19:01:45 -05:00
Steven Rostedt	e543ad7691	tracing: add cpu_file intialization for ftrace_dump Impact: fix to ftrace_dump output corruption The commit: `b04cc6b1f6` tracing/core: introduce per cpu tracing files added a new field to the iterator called cpu_file. This was a handle to differentiate between the per cpu trace output files and the all cpu "trace" file. The all cpu "trace" file required setting this to TRACE_PIPE_ALL_CPU. The problem is that the ftrace_dump sets up its own iterator but was not updated to handle this change. The result was only CPU 0 printing out on crash and a lot of "<0>"'s also being printed. Reported-by: Thomas Gleixner <tglx@linuxtronix.de> Tested-by: Darren Hart <dvhtc@us.ibm.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-04 18:32:28 -05:00
Eric Dumazet	64ca5ab913	rcu: increment quiescent state counter in ksoftirqd() If a machine is flooded by network frames, a cpu can loop 100% of its time inside ksoftirqd() without calling schedule(). This can delay RCU grace period to insane values. Adding rcu_qsctr_inc() call in ksoftirqd() solves this problem. Paul: "This regression was a result of the recent change from "schedule()" to "cond_resched()", which got rid of that quiescent state in the common case where a reschedule is not needed". Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-04 22:08:45 +01:00
Peter Zijlstra	efed792d67	tracing: add lockdep tracepoints for lock acquire/release Augment the traces with lock names when lockdep is available: 1) \| down_read_trylock() { 1) \| _spin_lock_irqsave() { 1) \| /* lock_acquire: &sem->wait_lock / 1) 4.201 us \| } 1) \| _spin_unlock_irqrestore() { 1) \| / lock_release: &sem->wait_lock / 1) 3.523 us \| } 1) \| / lock_acquire: try read &mm->mmap_sem / 1) + 13.386 us \| } 1) 1.635 us \| find_vma(); 1) \| handle_mm_fault() { 1) \| __do_fault() { 1) \| filemap_fault() { 1) \| find_lock_page() { 1) \| find_get_page() { 1) \| / lock_acquire: read rcu_read_lock / 1) \| / lock_release: rcu_read_lock / 1) 5.697 us \| } 1) 8.158 us \| } 1) + 11.079 us \| } 1) \| _spin_lock() { 1) \| / lock_acquire: __pte_lockptr(page) / 1) 3.949 us \| } 1) 1.460 us \| page_add_file_rmap(); 1) \| _spin_unlock() { 1) \| / lock_release: __pte_lockptr(page) / 1) 3.115 us \| } 1) \| unlock_page() { 1) 1.421 us \| page_waitqueue(); 1) 1.220 us \| __wake_up_bit(); 1) 6.519 us \| } 1) + 34.328 us \| } 1) + 37.452 us \| } 1) \| up_read() { 1) \| / lock_release: &mm->mmap_sem / 1) \| _spin_lock_irqsave() { 1) \| / lock_acquire: &sem->wait_lock / 1) 3.865 us \| } 1) \| _spin_unlock_irqrestore() { 1) \| / lock_release: &sem->wait_lock */ 1) 8.562 us \| } 1) + 17.370 us \| } Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: =?ISO-8859-1?Q?T=F6r=F6k?= Edwin <edwintorok@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1236166375.5330.7209.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-04 18:49:58 +01:00
Ingo Molnar	28b1bd1cbc	Merge branch 'core/locking' into tracing/ftrace	2009-03-04 18:49:19 +01:00
Peter Zijlstra	26575e28df	lockdep: remove extra "irq" string Impact: clarify lockdep printk text print_irq_inversion_bug() gets handed state strings of the form "HARDIRQ", "SOFTIRQ", "RECLAIM_FS" and appends "-irq-{un,}safe" to them, which is either redudant for *IRQ or confusing in the RECLAIM_FS case. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1236175192.5330.7585.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-04 18:38:34 +01:00
Peter Zijlstra	1c21f14ec4	lockdep: fix incorrect state name In the recent mark_lock_irq() rework a bug snuck in that would report the state of write locks causing irq inversion under a read lock as a read lock. Fix this by masking the read bit of the state when validating write dependencies. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1236172646.5330.7450.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-04 18:38:33 +01:00
Ingo Molnar	2b578459c3	Merge branch 'rfc' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/perfcounters into perfcounters/core	2009-03-04 11:43:53 +01:00
Ingo Molnar	8163d88c79	Merge commit 'v2.6.29-rc7' into perfcounters/core Conflicts: arch/x86/mm/iomap_32.c	2009-03-04 11:42:31 +01:00
Ingo Molnar	2602c3ba45	Merge branch 'rfc/splice/tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace	2009-03-04 11:15:42 +01:00
Ingo Molnar	a1be621dfa	Merge branch 'tracing/ftrace'; commit 'v2.6.29-rc7' into tracing/core	2009-03-04 11:14:47 +01:00
Paul Mackerras	2743a5b0fa	perfcounters: provide expansion room in the ABI Impact: ABI change This expands several fields in the perf_counter_hw_event struct and adds a "flags" argument to the perf_counter_open system call, in order that features can be added in future without ABI changes. In particular the record_type field is expanded to 64 bits, and the space for flag bits has been expanded from 32 to 64 bits. This also adds some new fields: * read_format (64 bits) is intended to provide a way to specify what userspace wants to get back when it does a read() on a simple (non-interrupting) counter; * exclude_idle (1 bit) provides a way for userspace to ask that events that occur when the cpu is idle be excluded; * extra_config_len will provide a way for userspace to supply an arbitrary amount of extra machine-specific PMU configuration data immediately following the perf_counter_hw_event struct, to allow sophisticated users to program things such as instruction matching CAMs and address range registers; * __reserved_3 and __reserved_4 provide space for future expansion. Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-03-04 20:36:51 +11:00
Steven Rostedt	2cadf9135e	tracing: add binary buffer files for use with splice Impact: new feature This patch creates a directory of files that correspond to the per CPU ring buffers. These are binary files and are made to be used with splice. This is the fastest way to extract data from the ftrace ring buffers. Thanks to Jiaying Zhang for pushing me to get this code fixed, and to Eduard - Gabriel Munteanu for his splice code that helped me debug my code. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-03 21:01:55 -05:00
Steven Rostedt	474d32b68d	ring-buffer: make ring_buffer_read_page read from start on partial page Impact: dont leave holes in read buffer page The ring_buffer_read_page swaps a given page with the reader page of the ring buffer, if certain conditions are set: 1) requested length is big enough to hold entire page data 2) a writer is not currently on the page 3) the page is not partially consumed. Instead of swapping with the supplied page. It copies the data to the supplied page instead. But currently the data is copied in the same offset as the source page. This causes a hole at the start of the reader page. This complicates the use of this function. Instead, it should copy the data at the beginning of the function and update the index fields accordingly. Other small clean ups are also done in this patch. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-03 20:52:27 -05:00
Steven Rostedt	e3d6bf0a07	ring-buffer: replace sizeof of event header with offsetof Impact: fix to possible alignment problems on some archs. Some arch compilers include an NULL char array in the sizeof field. Since the ring_buffer_event type includes one of these, it is better to use the "offsetof" instead, to avoid strange bugs on these archs. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-03-03 20:52:01 -05:00

... 38 39 40 41 42 ...

8411 commits