「GoCN酷Go推薦」數(shù)據(jù)備份 reed-solomn 庫 的使用
1. reed-solomn 是什么?
reed-solomn 假如磁盤損壞了一部分,或者光盤一部分出現(xiàn)了污漬,那是不是我們的信息就丟了呢?
當(dāng)然不是,有一種糾錯算法,可以在數(shù)據(jù)明顯缺失的前提下,依然可以恢復(fù)數(shù)據(jù),它就是 Reed–Solomon 算法。當(dāng)然,它事先需要冗余備份,當(dāng)丟失數(shù)據(jù)在一定范圍內(nèi),就可以恢復(fù)源文件。
2. Reed Solomon 編碼原理
把輸入數(shù)據(jù)視為向量 D=(D1,D2,..., Dn), 編碼后數(shù)據(jù)視為向量(D1, D2,..., Dn, C1, C2,.., Cm),RS 編碼可視為如下圖所示矩陣運(yùn)算。
alt 條形圖
需要注意:編碼矩陣 B 必須具有任意子矩陣可逆的特性。
3. Reed Solomon 解碼原理
RS 最多能容忍 m 個數(shù)據(jù)塊被刪除,m 包括實(shí)際數(shù)據(jù)和冗余數(shù)據(jù)。數(shù)據(jù)恢復(fù)的過程如下:
(1)假設(shè) D1、D4、C2 丟失,從編碼矩陣中刪掉丟失的數(shù)據(jù)塊/編碼塊對應(yīng)的行。
alt 條形圖
(2)由于 B' 是可逆的,記 B'的逆矩陣為 (B'^-1),則 B' * (B'^-1) = I 單位矩陣。兩邊左乘 B' 逆矩陣。
alt 條形圖
(3)得到如下原始數(shù)據(jù) D 的計(jì)算公式
alt 條形圖
4. 代碼實(shí)現(xiàn)及測試
github 地址:"github.com/klauspost/reedsolomon" 下載即可使用。
1.reed-solomon 編碼測試 此時,輸入 in.txt,go run main.go
輸出 out 目錄下,30 個 shards 文件。
package main
import (
"fmt"
"io"
"os"
"path/filepath"
"github.com/klauspost/reedsolomon"
)
func main() {
var DataShards = 10
var ParShards = 20
var OutDir = "./out"
var dataShards = &DataShards
var parShards = &ParShards
var outDir = &OutDir
fname := "in.txt"
// 1.Create encoding matrix.
enc, err := reedsolomon.NewStream(*dataShards, *parShards)
checkErr2(err)
fmt.Println("Opening", fname)
f, err := os.Open(fname)
checkErr2(err)
instat, err := f.Stat()
checkErr2(err)
shards := *dataShards + *parShards
out := make([]*os.File, shards)
// 2.創(chuàng)建輸入文件 30個shards
dir, file := filepath.Split(fname)
if *outDir != "" {
dir = *outDir
}
for i := range out {
outfn := fmt.Sprintf("%s.%d", file, i)
fmt.Println("Creating", outfn)
out[i], err = os.Create(filepath.Join(dir, outfn))
checkErr2(err)
}
// Split into files.
data := make([]io.Writer, *dataShards)
for i := range data {
data[i] = out[i]
}
// 3.原始文件拆分
err = enc.Split(f, data, instat.Size())
checkErr2(err)
// Close and re-open the files.
input := make([]io.Reader, *dataShards)
for i := range data {
out[i].Close()
f, err := os.Open(out[i].Name())
fmt.Println("Error ", err)
input[i] = f
defer f.Close()
}
// 4.封裝 parity
parity := make([]io.Writer, *parShards)
for i := range parity {
parity[i] = out[*dataShards+i]
defer out[*dataShards+i].Close()
}
// 5.Encode 編碼rs格式
err = enc.Encode(input, parity)
checkErr2(err)
fmt.Printf("File split into %d data + %d parity shards.\n", *dataShards, *parShards)
}
func checkErr2(err error) {
if err != nil {
os.Exit(1)
}
}
直接運(yùn)行,最后輸出如下:
$ go run main.go
Opening in.txt
Creating in.txt.0
Creating in.txt.1
Creating in.txt.2
Creating in.txt.3
Creating in.txt.4
Creating in.txt.5
Creating in.txt.6
Creating in.txt.7
Creating in.txt.8
Creating in.txt.9
Creating in.txt.10
Creating in.txt.11
Creating in.txt.12
Creating in.txt.13
Creating in.txt.14
Creating in.txt.15
Creating in.txt.16
Creating in.txt.17
Creating in.txt.18
Creating in.txt.19
Creating in.txt.20
Creating in.txt.21
Creating in.txt.22
Creating in.txt.23
Creating in.txt.24
Creating in.txt.25
Creating in.txt.26
Creating in.txt.27
Creating in.txt.28
Creating in.txt.29
File split into 10 data + 20 parity shards.
經(jīng)查看,30個文件已經(jīng)生成,其中前10個拼接就是原始文件。
2.reed-solomon 恢復(fù)測試
在上面基礎(chǔ)上,刪掉out目錄下面20個文件(編號6-24),剩下10個,執(zhí)行 go run recover_main.go
開始恢復(fù)源文件,并恢復(fù)刪掉的20個文件。
package main
import (
"fmt"
"io"
"os"
"github.com/klauspost/reedsolomon"
)
var OutFile = "out2.txt"
var outFile = &OutFile
var DataShards = 10
var ParShards = 20
var OutDir = "./out"
var dataShards = &DataShards
var parShards = &ParShards
var outDir = &OutDir
func main() {
fname := "out/in.txt"
// 1.Create matrix
enc, err := reedsolomon.NewStream(*dataShards, *parShards)
checkErr(err)
// 2.Open the inputs
shards, size, err := openInput(*dataShards, *parShards, fname)
checkErr(err)
// 3.Verify the shards
ok, err := enc.Verify(shards)
if ok {
fmt.Println("No reconstruction needed")
} else {
fmt.Println("Verification failed. Reconstructing data")
shards, size, err = openInput(*dataShards, *parShards, fname)
checkErr(err)
// 3.1 重新創(chuàng)建刪除的文件
out := make([]io.Writer, len(shards))
for i := range out {
if shards[i] == nil {
//dir, _ := filepath.Split(fname)
outfn := fmt.Sprintf("%s.%d", fname, i)
fmt.Println("Creating", outfn)
out[i], err = os.Create(outfn)
checkErr(err)
}
}
fmt.Println("reconstruct")
// 3.2 重建30個shards
err = enc.Reconstruct(shards, out)
if err != nil {
fmt.Println("Reconstruct failed -", err)
os.Exit(1)
}
// Close output.
for i := range out {
if out[i] != nil {
err := out[i].(*os.File).Close()
checkErr(err)
}
}
shards, size, err = openInput(*dataShards, *parShards, fname)
ok, err = enc.Verify(shards)
if !ok {
fmt.Println("Verification failed after reconstruction, data likely corrupted:", err)
os.Exit(1)
}
checkErr(err)
}
// 4.Join the shards and write them
outfn := *outFile
if outfn == "" {
outfn = fname
}
fmt.Println("Writing data to", outfn)
f, err := os.Create(outfn)
checkErr(err)
shards, size, err = openInput(*dataShards, *parShards, fname)
checkErr(err)
// join恢復(fù)原文件 but We don't know the exact filesize.
err = enc.Join(f, shards, int64(*dataShards)*size)
checkErr(err)
}
func openInput(dataShards, parShards int, fname string) (r []io.Reader, size int64, err error) {
// Create shards and load the data.
shards := make([]io.Reader, dataShards+parShards)
for i := range shards {
infn := fmt.Sprintf("%s.%d", fname, i)
fmt.Println("Opening", infn)
f, err := os.Open(infn)
if err != nil {
fmt.Println("Error reading file", err)
shards[i] = nil
continue
} else {
shards[i] = f
}
stat, err := f.Stat()
checkErr(err)
if stat.Size() > 0 {
size = stat.Size()
} else {
shards[i] = nil
}
}
return shards, size, nil
}
func checkErr(err error) {
if err != nil {
os.Exit(1)
}
}
先刪除0-5,25-29 這些文件,
然后運(yùn)行recover邏輯,
$ go run recover_main.go
Verification failed. Reconstructing data
Opening out/in.txt.0
Opening out/in.txt.1
Opening out/in.txt.2
Opening out/in.txt.3
Opening out/in.txt.4
Opening out/in.txt.5
Opening out/in.txt.25
Opening out/in.txt.26
Opening out/in.txt.27
Opening out/in.txt.28
Opening out/in.txt.29
Creating out/in.txt.5
Creating out/in.txt.6
Creating out/in.txt.7
Creating out/in.txt.8
Creating out/in.txt.9
Creating out/in.txt.10
Creating out/in.txt.11
Creating out/in.txt.12
Creating out/in.txt.13
Creating out/in.txt.14
Creating out/in.txt.15
Creating out/in.txt.16
Creating out/in.txt.17
Creating out/in.txt.18
Creating out/in.txt.19
Creating out/in.txt.20
Creating out/in.txt.21
Creating out/in.txt.22
Creating out/in.txt.23
Creating out/in.txt.24
reconstruct ...
Writing data to out2.txt
最后可以看到0-5,25-29 這些文件已經(jīng)恢復(fù)出來,并且源文件也恢復(fù)出來了 out2.txt.
總結(jié)
reed solomon 糾錯碼是一種特殊類型的糾錯碼。在事先冗余備份下,當(dāng)丟失數(shù)據(jù)在一定范圍內(nèi),調(diào)用恢復(fù)過程,就可以恢復(fù)源文件。一些大型的分布式文件存儲,都用它來保證文件的高可用性。當(dāng)然,使用起來非常方便,大家可以多動手試試,希望你能喜歡哦!
以上所有內(nèi)容均采用最新官方案例做示例
參考資料
reed solomn 編碼 ECC 之 Reed-Solomon 算法 圖片上傳及查看,使用 swarm 網(wǎng)關(guān)實(shí)現(xiàn)
《酷Go推薦》招募:
各位Gopher同學(xué),最近我們社區(qū)打算推出一個類似GoCN每日新聞的新欄目《酷Go推薦》,主要是每周推薦一個庫或者好的項(xiàng)目,然后寫一點(diǎn)這個庫使用方法或者優(yōu)點(diǎn)之類的,這樣可以真正的幫助到大家能夠?qū)W習(xí)到
新的庫,并且知道怎么用。
大概規(guī)則和每日新聞類似,如果報(bào)名人多的話每個人一個月輪到一次,歡迎大家報(bào)名!戳「閱讀原文」,即可報(bào)名
掃碼也可以加入 GoCN 的大家族喲~
