Thursday, February 21, 2013

Validating PDF using SAHI



There are two important point to consider when validating PDF -
  1. Textual Content verification
  2. Graphics Content verification

Verification of textual content is simpler because there are multiple ways in which text content of PDF can be extracted and verified using SAHI. We need to perform image comparison to validate the graphics content as well.

In this blog, our objective is to compare a PDF generated from a website, with the baseline version that we store on the machine running SAHI script. We will test text as well as graphics content of the PDF. There are multiple ways of doing the same. I will try give brief description of 3 approaches I have tried out.

Approach 1: Image comparison

In this approach we capture the screen cap of PDF pages and match the same with baseline images of that PDF. Image comparison can be done using imagebrick tool and then use its compare facility. http://www.imagemagick.org/script/compare.php You can download PDF and takescreenshot , save it as image and then compare image using imagebrick utility. Another way is using sikuli.

Here is an example how sikuli calls made through sahi script -

(1) Sikuli is a image comparison tool that compares images and give us match tolerance. If the images match 100% tolerance is 1.
(2) Open the PDF file1. Take the screen-cap and save the image as PNG. This can be done for one or all pages with fit to window zoom level.
(3) configure sikuli package in the sahi's classpath
(4) In the sahi script we can create the sikuli objects and interact with PDF file.

$cmdline = "C:/Program Files (x86)/Adobe/Reader 9.0/Reader/AcroRd32.exe " + $downloadedPdf;
var $screenObj= new org.sikuli.script.Screen();
var $app=new org.sikuli.script.App($cmdline);
$app.open();
$screenObj.wait($fitToPageIcon, 5000);
$clickIcon = $screenObj.find($fitToPageIcon);
$clickIcon.click($fitToPageIcon);
var $baseImage = $dirPathSikuli+"sikuli/image/"+$inputObject.cardId+"_base"+$i+".png";
var $matchObj = $screenObj.find($baseImage);
var $screenScore = parseFloat($matchObj.getScore());
_assertEqual(true, ($screenScore > 0.90), "Image Match with SIKULI");

Limitations of this approach -

(1) Accuracy of image comparison depends on the Sikuli's image comparison algorithm.
(2) images captured are machine dependant and have high maitainance cost.
(3) sahi script using sikuli cannot be replayed in the multi-threaded playback.
(4) Sikuli is currently supported for 32bit platform environment only.

Approach 2: Text Comparison

SAHI website gives good example explaining how this can be done using the PDF extractor Apache PDFbox. http://sahi.co.in/w/reading-pdf-files

Approach 3: Using the PDF Comparator
In this approach we make use of a third party PDF comparison tool. It compares the text and the graphics content. Our challenge here is to save the PDF downloaded from the website under test. Then provide the same PDF and the baselined PDF to a comparison tool, get the comparison result and pass it on to sahi script for reporting. Now all this has to be done through sahi. We make use of PDF comparator from http://www.qtrac.eu/diffpdf.html. PDF comp generates result in a text file and the same can be trapped for sahi reporting.

We can make use of _execute or the java Runtime.exec to run the PDF comparator. Capture the result file and read it using sahi to find failures if any.

var $dirPathBat;//path to pdfcomparator
var $dirPathPdf;//dir path of the PDF files baseline and the one //generated through the website
var $str = new Array();
$str[0] = $dirPathBat+"comparePdfTool/comparedBat.bat";
$str[1] = $dirPathPdf+$file1; //baselined
$str[2] = $dirPathPdf+$file2; //downloaded
$str[3] = $dirPathPdf+$result; //result file path

$obj = java.lang.Runtime.getRuntime();
$obj.exec($str, null, new java.io.File($dirPathBat+"comparePdfTool"));
var $fileContents = _readFile($dirPathPdf+"Pdf/"+$inputObject.result);
_assertTrue($fileContents=="No differences detected.\r\n","File does not match");

No comments:

Post a Comment