Automated PDF generator for web solutions

Have you ever faced the following project requirements: expressive data reports, visualized in two forms? First, on demand (using a web interface) with the ability to export to PDF files. Second, generated (PDF as well) and sent according to the schedule at the user’s request. They must contain charts (pie, linear, bar, etc.), flows, tables (not so simple) and other non-plain-text representations that make data more readable, impressive and qualitative.

You’ve got it, partially. Data is already visualized on the web page using most famous web technologies (HTML, CSS, JS). Client wants to export what he sees in a similar fashion and style. He likes graphs, diagrams, colored arrows, lines, merged table cells, table captions, headers.

PDF is probably the most-liked, most-chosen format for business documents. Thinking as a developer, facing scheduling PDF file generation requirements you are probably choosing a server side approach. So you need to reproduce the same data representation on the server side, mirroring a web based solution that is already done and shining. You probably need yet another server side PDF report design tool and what’s more important: additional time and developer resources (I bet you’re lucky and you have a programmer waiting for tasks) to do the same thing on the back-end (yes, server) side. Many hours (or weeks) later, the server side solution is prepared and you are ready to go. During the agile process, requirements change from time to time, so now you need to change the data visualization solution twice, on the server side and on the user interface. You’ve just started experiencing disadvantages. Do things have to be so time and work consuming ?

Let’s go back to the time when you were still thinking about a good solution for this requirement – yes, several hours (or weeks) back. You’ve heard that there are some tools to generate PDF files using a pure client side (browser based) solution. Googling a bit and … here it is: jsPDF – “Client-side JavaScript PDF generation for everyone”. For you too. Open sourced, 119 contributors, started in 2010, under MIT license, nice. And the demo looks very promising. You are tempted to try it, but … what about the second client’s requirement: file generation on schedule. At first glance, browser based solutions are useless. It required some brave man in front of the desk clicking some export button on the web interface and then the file would be physically available. Here comes a headless browser to the rescue. Following wikipedia: “A headless browser is a web browser without a graphical user interface.” and jumping to the use cases section: “Headless browsers are used for […] automating interaction of web pages.” Based on these statements you already know that you can program some tool, contact the browser in headless mode, visit the web page and instruct the tool to perform actions that need to be done to visualize the data. Without user interaction. Perfect. Let’s revisit the idea. Does it satisfy client needs ? First requirement is met pretty straightforwardly. Data is displayed on the web view and then passed to the web exporting tool. The second condition can be met too. Automation tool visits the web page according to the schedule, follows the appropriate steps to visualize the data and finally performs activities to export and save the exported file (for clarity, PDF does not have to be your only choice).

It’s time to find some tool that works with browsers. You are probably familiar with the search possibilities around the corner so I’m leaving this task as a challenge for demanding ones. I have some experience with selenium (and I bet many of you reading this have it too or at least have heard about it). Following their homepage statement: “Selenium automates browsers. That’s it! What you do with that power is entirely up to you.” You may be tempted to ask yourself: “Hey, isn’t it for automation testing ?” Believe me, it is not limited to that. Selenium may be used for any task that requires automating interaction with the browser. In order to start using it for our purpose you need to make use of language-specific client drivers. Just take a look at the section “Selenium Client & WebDriver Language Bindings” of their downloads page. Most popular programming languages are supported under the hood: Ruby, Java, Python, C#, JavaScript and others. You will find a lot of tutorials and Q&As about this tool.

OK, now we’ve got:

Application with data visualized, web based
Automation tool to interact with browsers

What else do we need ? Of course the browser. I’m not 100% sure about the first headless browser but I believe it was PhantomJS. Based on the wikipedia: “PhantomJS was released January 23, 2011 by Ariya Hidayat after several years in development.“ Unfortunately it’s no longer maintained (see here for details) so we should probably look around for something that is not in the suspended state. The list of headless browsers can be found on wikipedia and other resources like this one. However, if using selenium (Selenium WebDriver I should say), you are free to choose one of the supported browsers. I believe you already know at least one from this list 😉

There is the last piece of the puzzle to complete this integration. You will need to download additional drivers (for many browsers like Chrome, Firefox and Edge those are all standalone executables) to work with each of the major browsers. You don’t have to use every possible combination of these software. Just pick one pair, like Chrome browser and Chrome driver 🙂 And try it.

Let’s recap what we already achieved. Both requirements: to have exportable visualized data on demand and according to the schedule, are satisfied. What’s more important: resources are saved and work does not have to be duplicated on the server and client side.

P.S.

What about the authentication?
You can instruct an automation tool to firstly authenticate and then perform required actions. Imagine this tool might be a custom user.

Does it work?
Solution based on this article has been working on a couple of production instances for about a year. It definitely works.

Przemysław Fusik, Java Team Leader

Automated PDF generator for web solutions

Related Posts

Cookies