I needed to make an aspx page that would allow me to pull a PDF from disk, fill out fields in it, then stream the completed PDF to a browser. After searching around, I found that the consensus seems to be that iTextSharp is the way to go in C#. I grabbed version 4.1.2.
While the iText API docs are useful, it's hard to start out with that. I found that this example proved the most informative and useful of those that popped up on Google. However, even with that, I found that only the source code really answered all of my questions.
Extra things I found noteworthy:
So, since I want to be nice to the end-user, it's back to a MemoryStream instead of Response.OutputStream. The problem there is that Response.BinaryWrite() and Response.OutputStream.Write() both take a byte[]. And MemoryStream.ToArray() returns a deep-copy. So that would give me 2 copies of the download and 1 copy of the original in RAM, all at once; lame. So, the only solution (short of learning more about Response.Filter) is to use a temporary byte[] and "chunk it out". Now I'm doing this (basically):
With the basic equivalent of the above, I now have what I need. While I don't have a solution that uses the least possible RAM, I do have one that assures that any problems with the PDF are encountered before the headers are written and that the Content-Length can be accurately set.
While the iText API docs are useful, it's hard to start out with that. I found that this example proved the most informative and useful of those that popped up on Google. However, even with that, I found that only the source code really answered all of my questions.
Extra things I found noteworthy:
- All of the PdfReader constructors ultimately use the RandomAccessFileOrArray object to access the source PDF data.
- The PdfReader(RandomAccessFileOrArray...) constructor results in a partial read (ReadPdfPartial()), whereas all the others read & parse the entire PDF (ReadPdf()) during construction. It appears, from this post, that the former uses less RAM, but the other constructors result in faster performance. (I suspect most of it is hard drive related)
- You only need to call PdfReader.Close() when a partial read was requested (i.e. you used PdfReader(RandomAccessFileOrArray...)). And, in that case, it will call Close on the RandomAccessFileOrArray that was passed in. In all other cases, there's nothing unmanaged that needs to be dealt with since everything was taken care of during construction.
- PdfStamper doesn't implement IDisposable, so you have to put in an explicit try...finally instead of a using block.
- You must set PdfStamper.Writer.CloseStream to false if you want to use the output Stream (the one you passed into the constructor) after the call to PdfStamper.Close(), otherwise it gets closed.
- This feels like a "duh" bit of info, but the output Stream's position will be at the end when PdfStamper is done with it). Stream.Seek or Stream.Position should be able fix that for you.
string fileName = @"c:\path\to\file.pdf";
try
{
PdfStamper stamper = new PdfStamper(new PdfReader(fileName), Response.OutputStream);
try
{
AcroFields af = stamper.AcroFields;
af.SetField("field-name", "value");
stamper.FormFlattening = true;
}
finally
{ stamper.Close(); }
}
catch (Exception ex)
{ throw new ApplicationException("Unable to fill out PDF: " + fileName, ex); }
So, since I want to be nice to the end-user, it's back to a MemoryStream instead of Response.OutputStream. The problem there is that Response.BinaryWrite() and Response.OutputStream.Write() both take a byte[]. And MemoryStream.ToArray() returns a deep-copy. So that would give me 2 copies of the download and 1 copy of the original in RAM, all at once; lame. So, the only solution (short of learning more about Response.Filter) is to use a temporary byte[] and "chunk it out". Now I'm doing this (basically):
MemoryStream pdfOut = new MemoryStream(256000); // 256KB seems like a good starting point
try
{
PdfStamper stamper = new PdfStamper(new PdfReader(fileName), pdfOut);
try
{
stamper.Writer.CloseStream = false;
AcroFields af = stamper.AcroFields;
af.SetField("field-name", "value");
stamper.FormFlattening = true;
}
finally
{ stamper.Close(); }
}
catch (Exception ex)
{ throw new ApplicationException("Unable to fill out PDF: " + fileName, ex); }
// ...among other headers, tell the browser how much to expect.
Response.AppendHeader("Content-Length", pdfOut.Length.ToString());
pdfOut.Seek(0, SeekOrigin.Begin); // make sure we start at the beginning of the PDF
// "chunk" out the PDF
byte[] buffer = new byte[102400]; // 100KB seems like a good size
int bytesRead = 0;
for (long totalBytesRead = 0; Response.IsClientConnected && totalBytesRead < pdfOut.Length; totalBytesRead += bytesRead)
{
bytesRead = pdfOut.Read(buffer, 0, buffer.Length);
if (bytesRead < buffer.Length)
{
// We must do this because BinaryWrite will always write out the entire array
// (and we didn't fill our buffer on that last Read().
byte[] endBuffer = new byte[bytesRead];
Array.Copy(buffer, 0, endBuffer, 0, bytesRead);
Response.BinaryWrite(endBuffer);
}
else
Response.BinaryWrite(buffer);
}
With the basic equivalent of the above, I now have what I need. While I don't have a solution that uses the least possible RAM, I do have one that assures that any problems with the PDF are encountered before the headers are written and that the Content-Length can be accurately set.
Comments