1. Insert ignoring duplication
ignore duplication row(s). skip duplicate data and insert only non-duplicate data.
INSERT IGNORE INTO <SOURCE_TABLE> VALUES (VALUE1, VALUE2...);
INSERT IGNORE INTO <SOURCE_TABLE> SELECT * FROM <TARGET_TABLE>;
2. Inserting new rows and updating existing rows
2.1. Using Standard SQL(OuterJoin)
It needs two-step process (insert and update). The benefit to this approach is standards compliance. But it is poor efficiency.
INSERT INTO <SOURCE_TABLE> (KEY_COL1, COL2, COL3)
SELECT T.COL1, T.COL2, T.COL3
FROM <TARGET_TABLE> T
WHERE LEFT OUTER JOIN <SOURCE_TABLE> AS S
ON T.KEY_COL1 = S.KEY_COL1
WHERE S.KEY_COL1 IS NULL
;
UPDATE <TARGET_TABLE> AS T
INNER JOIN <SOURCE_TABLE> AS S ON T.KEY_COL1 = S.KEY_COL1
SET T.COL2 = S.COL2
, T.COL3 = S.COL3
;
2.2. Using Standard SQL(OuterJoin)
This is far more efficient, but it still might be very bad. It could lock the tables for a long time with large datasets, and like all two-step processes, it is not transactional.
INSERT INTO <TARGET_TABLE> (KEY_COL1, COL2, COL3)
SELECT S.COL1, S.COL2, S.COL3
FROM <SOURCE_TABLE> S
WHERE LEFT OUTER JOIN <SOURCE_TABLE> AS S
ON S.KEY_COL1 = T.KEY_COL1
WHERE T.KEY_COL1 IS NULL;
UPDATE <TARGET_TABLE> AS T
INNER JOIN <SOURCE_TABLE> AS S
ON T.KEY_COL1 = S.KEY_COL1
SET T.COL2 = S.COL2
, T.COL3 = S.COL3
;
UPDATE <TARGET_TABLE> AS T
INNER JOIN <SOURCE_TABLE> AS S
ON T.KEY_COL1 = S.KEY_COL1
SET T.COL2 = S.COL2
, T.COL3 = S.COL3
;
INSERT INTO <TARGETTABLE> (KEY_COL1, COL2, COL3)
SELECT S.KEY_COL1, S.COL2, S.COL3
FROM <SOURCE_TABLE> AS S
LEFT OUTER JOIN <TARGET_TABLE> AS T
ON S.KEY_COL1 = T.KEY_COL1
WHERE T.KEY_COL1 IS NULL
;
2.3. Using Non-standard ON DUPLICATE KEY UPDATE Statement
Recommend this way.
INSERT INTO <TARGET_TABLE> ( KEY_COL1, COL2, COL3)
SELECT KEY_COL1, COL2, COL3 FROM <SOURCE_TABLE>
ON DUPLICATE KEY UPDATE COL2 = <VALUE2>
, COL3 = <VALUE3>
;
No comments:
Post a Comment