This note is a follow-up to a recent comment a blog note about Row Migration:
So I wonder what is the difference between the two, parallel dml and serial dml with parallel scan, which makes them behave differently while working with migrated rows. Why might the strategy of serial dml with parallel scan case not work in parallel dml case? I am going to make a service request to get some clarifications but maybe I miss something obvious?
The comment also referenced a couple of MoS notes:
- Bug 17264297 “Serial DML with Parallel scan performs single block reads during full table scan when table has chained rows in 11.2”
- Doc ID 1514011.1 “Performance decrease for parallel DML on compressed tables or regular tables after 11.2 Upgrade
The latter document included a comment to the effect that 11.2 uses a “Head Piece Scan” while 11.1 uses a “First Piece scan”, which is a rather helpful comment. Conveniently the blog note itself referenced an earlier note on the potential for differentiating between migrated and chained rows through a “flag” byte associated with each row piece. The flag byte has an H bit for the row head piece, an F bit for the row first piece, and L bit for the row last piece and {no bits set} for a row piece in the middle of a chained row.
Side note: A “typical” simple row will be a single row-piece with the H, F and L bits all set; a simple migrated row will start with an “empty” row-piece in one block with the H bit set and a pointer (nrid – next rowid) to a row in another block that will have the F and L bits set and a pointer (hrid – head rowid) back to the head piece. A chained row could start with a row piece holding a few columns and the H and F bits set and a pointer to the next row piece which might lead to a long chain of row pieces with no bits set each pointing to the next row piece until you get to a row piece with the L bit set. Alternatively you might have row which had migrated and chained – which means it could start with an empty row piece with just the H bit and a pointer to the next row piece, then a row piece with the F bit set, a back pointer to the header, and a next pointer to the next row piece, which could lead to a long chain of row pieces with no bits set until you reach a row piece with the L bit set.
Combining the comments about “head piece” and “first piece” scans with the general principles of DML and locking it’s now possible to start makings some guesses about why the Oracle developers might want updates through tablescans to behave differently for serial and parallel tablescans. There are two performance targets to consider:
- How to minimise random (single block) I/O requests
- How to minimise the risk of deadlock between PX server processes.
Assume you’re doing a serial tablescan to find rows to update – assume for simplicity that there are no chained rows in the table. When you hit a migrated row (H bit only) you could follow the next rowid pointer (nrid) to find and examine the row. If you find that it’s a row that doesn’t need to be updated you’ve just done a completely redundant single block read; so it makes sense to ignore row pieces which are “H”-only row pieces and do a table scan based on “F” pieces (which will be FL “whole row” pieces thanks to our assumption of no chained rows). If you find a row which is an F row and it needs to be updated then you can do a single block read using the head rowid pointer (hrid) to lock the head row piece then lock the current row piece and update it; you only do the extra single block read for rows that need updates, not for all migrated rows. So this is (I guess) the “First Piece Scan” referenced in Doc ID 1514011.1. (And, conversely, if you scan the table looking only for row pieces with the H flag set this is probably the “Head Piece Scan”).
But there’s a potential problem with this strategy if the update is a parallel update. Imagine parallel server process p000 is scanning the first megabyte of a table and process p001 is scanning the second megabyte using the “first piece” algorithm. What happens if p001 finds a migrated row (flags = FL) that needs to be updated and follows its head pointer back into a block in the megabyte being scanned by p000? What if p000 has been busy updating rows in that block and there are no free ITLs for p001 to acquire to lock the head row piece? You have the potential for an indefinite deadlock.
On the other hand, if the scan is using the “head piece” algorithm p000 would have found the migrated row’s head piece and followed the next rowid pointer into a block in the megabyte being scanned by p001. If the row needs to be updated p000 can lock the head piece and the migrated piece.
At this point you might think that the two situations are symmetrical – aren’t you just as likely to get a deadlock because p000 now wants an ITL entry in a block that p001 might have been updating? Statistically the answer is “probably not”. When you do lots of updates it is possible for many rows to migrate OUT of a block; it is much less likely that you will see many rows migrate INTO a specific block. This means that in a parallel environment you’re more likely to see several PX servers all trying to acquire ITL entries in the same originating block than you are to see several PX servers trying to acquire ITL entries in the same destination block. There’s also the feature that when a row (piece) migrates into a block Oracle adds an entry to the ITL list if the number of inwards migrated pieces is more than the current number of ITL entries.
Conclusion
It’s all guesswork of course, but I’d say that for a serial update by tablescan Oracle uses the “first piece scan” to minimise random I/O requests while for a parallel update by tablescan Oracle uses the “head piece scan” to minimise the risk of deadlocks – even though this is likely to increase the number of random (single block) reads.
Finally (to avoid ambiguity) if you’ve done an update which does a parallel tablescan but a serial update (by passing rowids to the query co-ordinator) then I’d hope that Oracle would use the “first piece scan” for the parallel tablescan because there’s no risk of deadlock when only the query co-ordinator is the only process doing the locking and updating, which makes it safe to use the minimum I/O strategy. (And a paralle query with serial update happens quite frequently because people forget to enable parallel dml.)
Footnote
While messing around to see what happened with updates and rows that were both migrated and chained I ran the following script to create one nasty row. so that I could dump a few table blocks to check for ITLs, pointers, and locks. The aim was to get a row with a head-only piece (“H” bit), an F-only piece, a piece with no bits set, then an L-only piece. With an 8KB block size and 4,000 byte maximum for varchar2() this is what I did:
rem rem Script: migrated_lock.sql rem Author: Jonathan Lewis rem Dated: Jan 2019 rem Purpose: rem rem Last tested rem 18.3.0.0 rem create table t1 ( n1 number, l1 varchar2(4000), s1 varchar2(200), l2 varchar2(4000), s2 varchar2(200), l3 varchar2(4000), s3 varchar2(200) ); insert into t1 (n1,l1,s1) values(0,rpad('X',4000,'X'),rpad('X',200,'X')); commit; insert into t1 (n1,l1) values(1,null); commit; update t1 set l1 = rpad('A',4000), s1 = rpad('A',200), l2 = rpad('B',4000), s2 = rpad('B',200), l3 = rpad('C',4000), s3 = rpad('C',200) where n1 = 1 ; commit; execute dbms_stats.gather_table_stats(user,'t1'); update t1 set s1 = lower(s1), s2 = lower(s2), s3 = lower(s3) where n1 = 1 ; alter system flush buffer_cache; select dbms_rowid.rowid_relative_fno(rowid) rel_file_no, dbms_rowid.rowid_block_number(rowid) block_no, count(*) rows_starting_in_block from t1 group by dbms_rowid.rowid_relative_fno(rowid), dbms_rowid.rowid_block_number(rowid) order by dbms_rowid.rowid_relative_fno(rowid), dbms_rowid.rowid_block_number(rowid) ;
The query with all the calls to dbms_rowid gave me the file and block number of the row I was interested in, so I dumped the block, then read the trace file to find the next block in the chain, and so on. The first block held just the head piece, the second block held the n1 and l1 columns (which didn’t get modified by the update), the third block held the s1 and l2 columns, the last block held the s2, l3 and s3 columns. I had been expecting to see the split as (head-piece(, (n1, l1, s1), (l2, s2), (l3, s3) – but as it turned out the unexpected split was a bonus.
Here are extracts from each of the blocks (in the order they appeared in the chain), showing the ITL information and the “row overhead” information. If you scan through the list you’ll see that three of the 4 blocks have an ITL entry for transaction id (xid) of 8.1e.df3, using three consecutive undo records in undo block 0x0100043d. My update has locked 3 of the 4 rowpieces – the header and the two that have changed. It didn’t need to “lock” the piece that didn’t change. (This little detail was the bonus of the unexpected split.)
Block 184 --------- Itl Xid Uba Flag Lck Scn/Fsc 0x01 0x000a.00b.00000ee1 0x01000bc0.036a.36 C--- 0 scn 0x00000000005beb39 0x02 0x0008.01e.00000df3 0x0100043d.0356.2e ---- 1 fsc 0x0000.00000000 ... tab 0, row 1, @0xf18 tl: 9 fb: --H----- lb: 0x2 cc: 0 nrid: 0x00800089.0 Block 137 (columns n1, l1 - DID NOT CHANGE so no ITL entry acquired) --------- (the lock byte relates to the previous, not cleaned, update) Itl Xid Uba Flag Lck Scn/Fsc 0x01 0x000a.00b.00000ee1 0x01000bc0.036a.35 --U- 1 fsc 0x0000.005beb39 0x02 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000 0x03 0x0000.000.00000000 0x00000000.0000.00 C--- 0 scn 0x0000000000000000 ... tab 0, row 0, @0xfcb tl: 4021 fb: ----F--- lb: 0x1 cc: 2 hrid: 0x008000b8.1 nrid: 0x00800085.0 Block 133 (columns s1, l2) -------------------------- Itl Xid Uba Flag Lck Scn/Fsc 0x01 0x000a.00b.00000ee1 0x01000bc0.036a.34 C--- 0 scn 0x00000000005beb39 0x02 0x0008.01e.00000df3 0x0100043d.0356.2f ---- 1 fsc 0x0000.00000000 0x03 0x0000.000.00000000 0x00000000.0000.00 C--- 0 scn 0x0000000000000000 ... tab 0, row 0, @0xf0b tl: 4213 fb: -------- lb: 0x2 cc: 2 nrid: 0x008000bc.0 Block 188 (columns s2, l3, s3) ------------------------------ Itl Xid Uba Flag Lck Scn/Fsc 0x01 0x000a.00b.00000ee1 0x01000bc0.036a.33 C--- 0 scn 0x00000000005beb39 0x02 0x0008.01e.00000df3 0x0100043d.0356.30 ---- 1 fsc 0x0000.00000000 0x03 0x0000.000.00000000 0x00000000.0000.00 C--- 0 scn 0x0000000000000000 ... tab 0, row 0, @0xe48 tl: 4408 fb: -----L-- lb: 0x2 cc: 3
Note, by the way, how there are nrid (next rowid) entries pointing forward in every row piece (except the last), but it’s only the “F” (First) row-piece has the hrid (head rowid) pointer pointing backwards.