There are two important point to
consider when validating PDF -
- Textual Content verification
- Graphics Content verification
Verification of textual content is
simpler because there are multiple ways in which text content of PDF
can be extracted and verified using SAHI. We need to perform image
comparison to validate the graphics content as well.
In this blog, our objective is to
compare a PDF generated from a website, with the baseline version
that we store on the machine running SAHI script. We will test text
as well as graphics content of the PDF. There are multiple ways of
doing the same. I will try give brief description of 3 approaches I
have tried out.
Approach 1: Image comparison
In this approach we capture the screen
cap of PDF pages and match the same with baseline images of that PDF.
Image comparison can be done using imagebrick tool and then use its
compare facility. http://www.imagemagick.org/script/compare.php
You can download PDF and takescreenshot , save it as image and then
compare image using imagebrick utility. Another way is using sikuli.
Here is an example how sikuli calls
made through sahi script -
(1) Sikuli is a image comparison tool
that compares images and give us match tolerance. If the images match
100% tolerance is 1.
(2) Open the PDF file1. Take the
screen-cap and save the image as PNG. This can be done for one or all
pages with fit to window zoom level.
(3) configure sikuli package in the
sahi's classpath
(4) In the sahi script we can create
the sikuli objects and interact with PDF file.
$cmdline = "C:/Program Files
(x86)/Adobe/Reader 9.0/Reader/AcroRd32.exe " + $downloadedPdf;
var $screenObj= new
org.sikuli.script.Screen();
var $app=new
org.sikuli.script.App($cmdline);
$app.open();
$screenObj.wait($fitToPageIcon, 5000);
$clickIcon =
$screenObj.find($fitToPageIcon);
$clickIcon.click($fitToPageIcon);
var $baseImage =
$dirPathSikuli+"sikuli/image/"+$inputObject.cardId+"_base"+$i+".png";
var $matchObj =
$screenObj.find($baseImage);
var $screenScore =
parseFloat($matchObj.getScore());
_assertEqual(true, ($screenScore >
0.90), "Image Match with SIKULI");
Limitations of this approach -
(1) Accuracy of image comparison
depends on the Sikuli's image comparison algorithm.
(2) images captured are machine
dependant and have high maitainance cost.
(3) sahi script using sikuli cannot be
replayed in the multi-threaded playback.
(4) Sikuli is currently supported for
32bit platform environment only.
Approach 2: Text Comparison
SAHI website gives good example
explaining how this can be done using the PDF extractor Apache
PDFbox. http://sahi.co.in/w/reading-pdf-files
Approach 3: Using the PDF
Comparator
In this approach we make use of a third party PDF comparison tool. It
compares the text and the graphics content. Our challenge here is to
save the PDF downloaded from the website under test. Then provide the
same PDF and the baselined PDF to a comparison tool, get the
comparison result and pass it on to sahi script for reporting. Now
all this has to be done through sahi. We make use of PDF comparator
from http://www.qtrac.eu/diffpdf.html.
PDF comp generates result in a text file and the same can be trapped
for sahi reporting.
We can make use of _execute or the java Runtime.exec to run the PDF
comparator. Capture the result file and read it using sahi to find
failures if any.
var
$dirPathBat;//path to pdfcomparator
var
$dirPathPdf;//dir path of the PDF files baseline and the one
//generated through the website
var
$str = new
Array();
$str[0]
= $dirPathBat+"comparePdfTool/comparedBat.bat";
$str[1]
= $dirPathPdf+$file1; //baselined
$str[2]
= $dirPathPdf+$file2; //downloaded
$str[3]
= $dirPathPdf+$result; //result file path
$obj
= java.lang.Runtime.getRuntime();
$obj.exec($str,
null,
new
java.io.File($dirPathBat+"comparePdfTool"));
var
$fileContents = _readFile($dirPathPdf+"Pdf/"+$inputObject.result);
_assertTrue($fileContents=="No
differences detected.\r\n","File
does not match");
No comments:
Post a Comment