Understanding io.TeeReader
In the vast library of Go’s built-in packages, the io
package stands out due to its sheer utility. Today, we’ll be focusing on a particular function in this package - the io.TeeReader
function.
What is TeeReader?
The io.TeeReader
function in Go creates a new reader that simultaneously reads data from a source reader and writes the same data to a writer. You can think of it as a “T” junction in a pipeline, where data is flowing in from one direction and is split into two directions.
The function signature is as follows:
1
func TeeReader(r Reader, w Writer) Reader
This function takes a Reader
and a Writer
as arguments and returns a Reader
. The returned Reader
behaves exactly like the original Reader
with the added behavior of writing to the provided Writer
.
Why Use TeeReader?
The power of io.TeeReader
comes from its ability to “spy” on data as it’s being read, without modifying the flow of data or the interface of the Reader
. This makes it ideal for situations where you need to process the same data twice. Examples include logging raw data being read, calculating checksums on the fly, or displaying progress during data transfers, as you’ll see in the upcoming example.
How Does TeeReader Work?
The io.TeeReader
function in Go’s io
package returns a new Reader
that is essentially a structure called a teeReader
. This structure has two significant components:
r
: This is the underlyingReader
from which data is read.w
: This is theWriter
to which data read fromr
is written.
Here’s how it is structured:
1
2
3
4
type teeReader struct {
r Reader // underlying reader
w Writer // writer to write data to
}
The teeReader
struct implements a Read
method that receives a byte slice as a parameter. The Read
method reads data into this byte slice from the underlying reader (r
) and concurrently writes the same data to the Writer
(w
).
In practical terms, the Read
method works in the following steps:
- The
Read
method calls theRead
method of the underlyingReader
(r
) to read the data into the byte slice. This method returns the number of bytes read and any error encountered. - If the number of bytes read (
n
) is more than zero, theRead
method writes those bytes to theWriter
(w
). If an error occurs during writing, the method immediately returns the number of bytes written and the encountered error. - The
Read
method then returns the number of bytes read and any error encountered during reading.
Here’s how the Read
method of teeReader
is defined:
1
2
3
4
5
6
7
8
9
func (t *teeReader) Read(p []byte) (n int, err error) {
n, err = t.r.Read(p)
if n > 0 {
if n, err := t.w.Write(p[:n]); err != nil {
return n, err
}
}
return
}
In essence, the TeeReader
is a conduit that allows data to be read from a Reader
and simultaneously written to a Writer
. The Read
method within teeReader
helps manage this process, allowing you to “spy” on data as it passes through without modifying the flow or the interface of the Reader
.
Using TeeReader
Now that we’ve seen how io.TeeReader
works, let’s dive into a practical example.
Imagine you are downloading a large file and you want to calculate its checksum on-the-fly to verify the integrity of the file after download. Also, you want to display the progress of the download. This is a perfect scenario to use io.TeeReader
.
In the example below, we’ll download a gzip file, save it locally, and use io.TeeReader
to display the download progress:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
type progress struct {
total uint64
}
func (p *progress) Write(b []byte) (int, error) {
p.total += uint64(len(b))
fmt.Printf("Downloaded %d bytes...\n", p.total)
return len(b), nil
}
func main() {
res, err := http.Get("http://storage.googleapis.com/books/ngrams/books/googlebooks-eng-all-5gram-20120701-0.gz")
if err != nil {
log.Fatal(err)
}
defer res.Body.Close()
localFile, err := os.OpenFile("file.txt", os.O_CREATE|os.O_WRONLY, 0600)
if err != nil {
log.Fatal(err)
}
defer localFile.Close()
gzipReader, err := gzip.NewReader(res.Body)
if err != nil {
log.Fatal(err)
}
teeReader := io.TeeReader(gzipReader, &progress{})
if _, err := io.Copy(localFile, teeReader); err != nil {
log.Fatal(err)
}
}
In this code, we’ve created a progress
struct that implements the io.Writer
interface. It’s used to calculate the total bytes read from the gzip file. The Write
method is called every time io.TeeReader
reads data, thereby tracking the download progress.
Running this program will result in the output:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
...
...
Downloading 328.406250 MB...
Downloading 328.437500 MB...
Downloading 328.468750 MB...
Downloading 328.500000 MB...
Downloading 328.531250 MB...
Downloading 328.562500 MB...
Downloading 328.593750 MB...
Downloading 328.625000 MB...
Downloading 328.656250 MB...
Downloading 328.687500 MB...
Downloading 328.718750 MB...
Downloading 328.750000 MB...
Downloading 328.781250 MB...
Downloading 328.812500 MB...
Downloading 328.843750 MB...
Downloading 328.869534 MB...
By using io.TeeReader
, we’re able to easily add progress reporting functionality without changing how the original reader behaves.
Conclusion
Go’s io.TeeReader
is a powerful function that can provide real-time insights into data as it’s being read. By understanding its internals and how to use it, you can enhance your data pipelines with additional, simultaneous processing steps.
Happy coding!