删除关联表中重复数据的MySQL存储过程
2015-07-21
存储过程sql如下:
DROP PROCEDURE IF EXISTS getDupIds;
CREATE PROCEDURE getDupIds(IN tableName VARCHAR(100), IN pk VARCHAR(100), IN groupFields VARCHAR(100))
BEGIN
DECLARE groupIds TEXT;
DECLARE _done INT DEFAULT 0;
DECLARE _cur CURSOR FOR
SELECT * FROM check_dupid_view;
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET _done = 1;
CREATE TEMPORARY TABLE IF NOT EXISTS tmpTable(`ids` TEXT NOT NULL) ENGINE=MYISAM CHARSET=utf8;
SET @v_sql = CONCAT("CREATE VIEW check_dupid_view AS SELECT GROUP_CONCAT(CAST(", pk, " AS CHAR))
AS `group_id` FROM ", tableName, " GROUP BY ", groupFields, " HAVING COUNT(*) > 1");
PREPARE stmt FROM @v_sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
OPEN _cur;
FETCH _cur INTO groupIds;
WHILE(_done <> 1) DO
SET groupIds = CONCAT("DELETE FROM `", tableName, "` WHERE `", pk, "` IN (", SUBSTRING(groupIds, INSTR(groupIds,',')+1), ");");
INSERT INTO tmpTable(`ids`) VALUES (groupIds);
FETCH _cur INTO groupIds;
END WHILE;
CLOSE _cur;
SELECT * FROM tmpTable;
DROP TEMPORARY TABLE tmpTable;
DROP VIEW check_dupid_view;
END;
调用方式:假设有一张分类关联表叫做article_topic
,记录了每篇文章与分类的关系,表结构如下:
id int(10) primary key auto_increment not null,
article_id int(10) not null,
topic_id int(10) not null,
那么调用存储过程删除重复记录:call getDupIds('article_topic', 'id', 'article_id, topic_id'); 第一个参数是表名,第二个参数是表的主键名,第三个参数是重复的过滤条件,就是放在group by条件语句后的字段。
调用后将生成类似DELETE FROM `article_topic` WHERE `id` IN (1,2,3);
的sql语句,直接执行sql语句即可删除重复记录。
一般在处理删除重复纪录的时候,会用到DELETE FROM `table_name` GROUP BY xx,xx,xx HAVING COUNT(*) > 1;
的sql,但是我遇到过同一条数据重复了n次,那么这条sql语句也要执行n-1次才能把所有重复项删除干净,效率不高。所以干脆先根据提供的三个参数把所有重复项找出来,接着用一个游标遍历结果集,用mysql自带的SUBSTRING函数,把所有重复项的id只保留第一个,其余的id都筛选出来,放到一个临时表里输出。
还有一点需要注意的是,mysql存储过程中如果有拼接sql语句的,需要对这个sql语句进行预处理,如line 12~16所示。