背景
客戶N在使用H部門提供的MySQL遇到部分性能問題後,未得到H部門的及時支撐。機緣巧合,我們的服務化MySQL剛剛發佈第一版,客戶N有意切換我們的MySQL。由於部門策略調整,我們準備由原來的社區MySQL切換爲部門R的商業版MySQL,其間對接問題不提,客戶提出的首要問題是前期嘗試通過mysqldump備份數據,發現有報錯並且很慢,我們的策略是 爲拓展業務先把鍋接下來吧 答應先提供數據遷移方案供客戶評估。
機器、數據、應用情況
源機器cpu核心數16,內存32G;
兩臺機器,一個是master,一個是slave;未配置互爲主備;
開啓了基於GTID的主從複製;
從鏡像庫來看,數據量3800W左右,實際生產環境每天還會增加約不到100w;
0-1w
1w-10w
10w-50w
50w-100w
100w-1000w
>1000w
表數量約
2105
83
28
5
6
1
MySQL爲社區版5.7.23,所有表均爲INNODB引擎;
據客戶N的業務人員反饋,他們嘗試使用mysqldump可能會報錯。
一些準備工作
爲了能夠順滑的開展後期工作,我習慣先整理一些常用的命令,以備隨時複製粘貼…
-- 查詢所有業務數據庫的表名,數據庫,存儲引擎信息
select table_name,table_schema,engine from information_schema.tables where engine='innodb' and table_schema not in('mysql','information_schema','performance_schema','sys');
-- 查詢所有業務數據庫的表的數量
select count(*) from information_schema.tables where engine='innodb' and table_schema not in('mysql','information_schema','performance_schema','sys');
-- 查詢所有表的數據量
SELECT CONCAT(TABLE_SCHEMA,'.',TABLE_NAME) AS table_name, IFNULL(TABLE_ROWS,0) as table_rows FROM information_schema.tables WHERE TABLE_SCHEMA NOT IN ('mysql','information_schema','performance_schema','sys') ORDER BY 2;
-- 查詢所有業務數據庫的視圖數量
select table_name,table_schema from information_schema.views where table_schema not in ('mysql','information_schema','performance_schema','sys');
select count(*) from information_schema.views where table_schema not in ('mysql','information_schema','performance_schema','sys');
-- 查詢所有routines(存儲過程和函數)的數量
select * from mysql.proc where db not in ('mysql','information_schema','performance_schema','sys')\G
-- 查詢所有觸發器的數量
SELECT * FROM information_schema.triggers where TRIGGER_SCHEMA not in ('mysql','information_schema','performance_schema','sys')\G
-- 查詢所有事件的數量
SELECT * FROM information_schema.EVENTS where EVENT_SCHEMA not in ('mysql','information_schema','performance_schema','sys')\G
-- 查詢所有用戶數量
select user,host from mysql.user;
-- 查看磁盤IO信息
iostat -x -p /dev/mapper/vg02-lv02 1 10 -m
iostat -x -p 1 10 -m
首先嚐試使用原生mysqldump
業務誠不欺我,果然有坑,報錯如下(安全需要,隱藏關鍵信息)。
mysqldump: Couldn’t execute SHOW FIELDS FROM XX
: View XX.XX
references invalid table(s) or column(s) or function(s) or
definer/invoker of view lack rights to use them (1356)
報錯信息很明顯了,本次實踐中,主要是視圖引用創建語句中子查詢的列不存在,select 都會報錯,這個我們只能讓業務自己去審視,決策是否刪除或者修復。
由於通過mysqldump來發現那些視圖有問題非常不效率,所有寫了一個簡單的腳本:
蒐集所有有問題的視圖.
#!/usr/bin/env bash
function usage {
echo "Usage: $0 [-u USER_NAME] [-p PASSWORD] [-d WORKDIR] [-D:DROP ERROR VIEWS]"
echo "Do not support -uroot, using -u root please."
# too 2
exit 2
}
function set_variable {
local varname=$1
shift
if [[ -z "${!varname}" ]]; then
eval "$varname=\"$@\""
else
echo "Error: $varname already set"
usage
fi
}
function execMysqlCommand {
mysql -u${USER_NAME} -p${PASSWORD} --skip-column-names -e "$1"
}
function checkView {
viewName="$1"
# too slow
# mysql -u${USER_NAME} -p${PASSWORD} -e "select 1 from ${viewName} limit 1" >/dev/null 2>>/home/mysql/temp/view_error
# not good either
# mysql -u${USER_NAME} -p${PASSWORD} -e "update ${viewName} set thisIsANotExistCol=123;" >/dev/null 2>>/home/mysql/temp/view_error
#
execMysqlCommand "show fields from ${viewName};" >/dev/null 2>>/home/mysql/temp/view_error
return $?
}
function checkAllViewsAndGetErrorViews {
echo "">/home/mysql/temp/view_error
i=1
for view in ${views[@]};do
echo -n "checking $view ...$i/${#views[@]}" "..."
checkView ${view}
result=$?
[[ ${result} -ne 0 ]] && echo "bad"
[[ ${result} -ne 0 ]] && echo "pass"
((i++))
done;
cat /home/mysql/temp/view_error|grep "1356"|awk -F"'" '{print $2}'>/home/mysql/temp/error_list
rm /home/mysql/temp/view_error -rf
error_views=(`cat /home/mysql/temp/error_list`)
}
function printIgnoreMsg {
[[ ${#error_views[@]} -gt 0 ]] && echo "You can add these statements to mysqldump to ignore those error views:"
for view in ${error_views[@]};do
echo -n " --ignore-table=${view}"
done
echo ""
}
function backupErrorViewsSql {
echo "Backing up create statement of error views to ${WORKDIR}..."
echo "" > /home/mysql/temp/backup_create_view_sql -rf
for view in ${error_views[@]};do
execMysqlCommand "show create view $view;" >>/home/mysql/temp/backup_create_view_sql 2>/dev/null
done
cat /home/mysql/temp/backup_create_view_sql|awk -F'\t' '{print $2";"}'|grep -v 'Create View;'>>/home/mysql/temp/backup_create_view
rm -rf /home/mysql/temp/backup_create_view_sql
echo "Done backing up create statement of error views."
}
function deleteErrorViews {
echo "Dropping error views..."
for view in ${error_views[@]};do
while [[ "X" == "X${confirm}" ]];do
read -p "please confirm to delete ${view}:(y/n)" confirm
done
if [[ "Xy" == "X${confirm}" ]];then
execMysqlCommand "drop view $view;" 2>/dev/null
fi
done
echo "Done dropping error views."
}
init() {
unset DELETE_VIEWS USER_NAME PASSWORD WORKDIR
while getopts 'u:p:d:D?h' option
do
case ${option} in
d) set_variable WORKDIR $OPTARG ;;
D) set_variable DELETE_VIEWS true ;;
u) set_variable USER_NAME $OPTARG ;;
p) set_variable PASSWORD $OPTARG ;;
h|?) usage ;; esac
done
[[ -z "${USER_NAME}" ]] && usage
[[ -z "${PASSWORD}" ]] && usage
[[ -z "${WORKDIR}" ]] && set_variable WORKDIR "/home/mysql/temp" && mkdir -p ${WORKDIR}
echo "Using directory ${WORKDIR} as temp dir."
}
getAllViews() {
echo "Getting all views from schema..."
views=(`execMysqlCommand "select concat(table_schema,'.',table_name) from information_schema.views where table_schema not in ('mysql','information_schema','performance_schema','sys');" 2>/dev/null`)
}
init $@
getAllViews
checkAllViewsAndGetErrorViews
printIgnoreMsg
[[ X"true" == X"${DELETE_VIEWS}" && ${#error_views[@]} -gt 0 ]] && backupErrorViewsSql && deleteErrorViews
命令優化x
具體方案之前,先加上一些基本的備份對象
--hex-blob --single-transaction --quick --routines --triggers
方案一 160分鐘
單線程直接執行mysqldump,大概160分鐘
> /data01/chroot/usr/local/mysql5.7.23/bin/mysqldump -udbXXXX -pXXXX --all-databases --hex-blob --ignore-table=netcxx.xxxxx --ignore-table=netxxx.rxxx(此處很多忽略的視圖) | gzip > /temp/back0129.sql.gz
方案二 90分鐘
考慮一個表一個文件,10個線程,大概90分鐘;TODO 測試增加線程
multidump.sh[lines=25..55].
multidump() {
rm -rf ${WORKDIR}/backup
mkdir -p ${WORKDIR}/backup
COMMIT_COUNT=0
COMMIT_LIMIT=10
error_views_file="${WORKDIR}/error_list"
DBTBS=(`cat ${WORKDIR}/listOfTables`)
i=1
for DBTB in ${DBTBS[@]};do
echo "processing $i/${#DBTBS[@]}"
((i++))
DB=`echo ${DBTB} | sed 's/\./ /g' | awk '{print $1}'`
TB=`echo ${DBTB} | sed 's/\./ /g' | awk '{print $2}'`
if [[ "X"`grep -w ${DBTB} ${error_views_file}` != X"" ]];then
echo skip "${DBTB}"
continue
fi
dumpIt ${DB} ${TB}
(( COMMIT_COUNT++ ))
if [[ ${COMMIT_COUNT} -eq ${COMMIT_LIMIT} ]]
then
COMMIT_COUNT=0
wait
fi
done
if [[ ${COMMIT_COUNT} -gt 0 ]]
then
wait
fi
}
方案三 15-22分鐘
mysqlpump 是 mysql 提供的工具,文檔和網上教程一大堆,這裏只談使用。 可以很直觀的看到執行到哪個表,剩餘多少行;注意:mysqlpump遇到錯誤會停止繼續,比如命令不正確、數據結構有問題。而且這個數據庫開啓GTID,所以如果你的數據庫沒有此選項,要把命令中的—set-gtid-purged=ON去掉。
兩種壓縮格式的時間差距還是很明顯:
mysqlpump -uusername -ppassword –compress-output=ZLIB –default-parallelism=100 –set-gtid-purged=ON –hex-blob –add-drop-database –add-drop-table –add-drop-user –users |gzip > /temp/test.sql.gz
Dump progress: 0/xx tables, xx/xxxxxxxxx rows Dump completed in xxxxxx milliseconds
mysqlpump -uusername -ppassword –compress-output=LZ4 –default-parallelism=100 –set-gtid-purged=ON –hex-blob –add-drop-database –add-drop-table –add-drop-user –users > /temp/testlz4.lz4
方案四
mysqlpump 可以針對database進行多線程導出,但是有時候數據分佈不均勻,90%的數據可能都在一個表內,這種情況下mysqlpump顯得無能爲力。有沒有可以對單個大表繼續進行分拆的工具呢? mydumper 可以做這件事。
首先統計表的分佈
totalSql="SELECT IFNULL(SUM(TABLE_ROWS),0) as t_rows_sum FROM information_schema.tables WHERE TABLE_SCHEMA NOT IN ('mysql','information_schema','performance_schema','sys');"
eachTableSql="SELECT CONCAT(TABLE_SCHEMA,'.',TABLE_NAME) AS table_name, IFNULL(TABLE_ROWS,0) as table_rows FROM information_schema.tables WHERE TABLE_SCHEMA NOT IN ('mysql','information_schema','performance_schema','sys') ORDER BY 2;"
驗證mydumper導出database的效率
- 只導出netcare,17分鐘-20分鐘左右
mydumper -u username -p password -v 3 -B databaseName –triggers –events –routines –rows=500000 –compress-protocol -c -t threadNum.ie.100 –trx-consistency-only –outputdir /temp/mydumper
驗證mydumper導出某一個大表的效率
https://dev.mysql.com/doc/mysql-enterprise-backup/8.0/en/
方案五 優化前:21分鐘
unset tempdir
tempdir=/data03/backup_`date '+%y%m%d%H%M%S'`
mkdir ${tempdir}
~/mysqlbackup -udbAdmin -pabcd1234 --backup-dir=${tempdir} --compress backup
echo "Successfully backing up data to ${tempdir}"
# 有待優化 https://dev.mysql.com/doc/mysql-enterprise-backup/4.1/en/backup-capacity-options.html
--limit-memory=MB (default 100)
--read-threads=num_threads (default 1)
--process-threads=num_threads (default 6)
--write-threads=num_threads (default 1)
~/mysqlbackup -udbAdmin -pabcd1234 --backup-dir=${tempdir} --compress backup
#優化後: 15分鐘
# 整庫備份到單個文件
~/mysqlbackup -udbAdmin -pabcd1234 --compress --compress-level=5 --limit-memory=1024 --read-threads=10 --process-threads=15 --write-threads=10 --backup-dir=${tempdir} --backup-image=/data03/`basename ${tempdir}`.bin backup-to-image
#直接備份到目標機器:
~/mysqlbackup -udbAdmin -pabcd1234 --compress --compress-level=5 --limit-memory=1024 --read-threads=10 --process-threads=15 --write-threads=10 --backup-dir=${tempdir} --backup-image=- backup-to-image | ssh root@10.15.32.73 'cat > /opt/temp_for_restore/my_backup.bin'
具體備份恢復使用見另一篇文章 mysql gtid 主從複製數據遷移(物理備份)