Convert HTML Files Into Word, Excel & PDF Docs With PHP

The web as a publishing platform has come on leaps and bounds over the last decade or two. HTML has gone from being a simple way to markup basic documentation to a fully-fledged interactive publishing medium. But it’s still useful to be able to convert HTML into other formats, especially for systems that involve a lot of data and require exportable reports.

For example, it’s often useful to be able to download or email PDF summaries of reports or invoices to clients. Or to offer the ability for customers on an ecommerce site to download their order details as a Word document.

To get started, I’m going to assume you’re already familiar with PHP and setting up a basic web app.

Also, I’m assuming you’ve got some basic familiarity with the command line, as I’m going to be making use of the very fantastic PHP package manager, composer


Composer is, in a nutshell, a way of easily installing PHP code into your application without the headache of manually including external libraries with all their dependencies.

For a quick intro and installation guide for Composer, click here. That’ll get you up and running.

Convert HTML To PDF

To install the library I’m using to convert HTML into PDF, DomPDF, run the following composer command from your project root:

composer require dompdf/dompdf

That’ll install all the required libraries I need to run DomPDF from within my simple little PHP app.

All I’ll be doing here is reading in the contents of my external HTML file, sample.html placed within the same directory as my project files.

Then, using DOMPDF’s own internal functionality, stream the generated file to the user’s browser, ready for downloading.

Here’s the code:


// reference the Dompdf namespace
use Dompdf\Dompdf;

$dompdf = new Dompdf();
// Enable the HTML5 parser to tolerate poorly formed HTML
$dompdf->set_option('isHtml5ParserEnabled', true);

// Load into DomPDF from the external HTML file
$content = file_get_contents('sample.html');


// Render and download

And the output, a downloadable PDF.

Generated PDF from DomPDF output
The result of running DomPDF on a chunk of HTML

Try it yourself:

Generate & Download PDF

You can also generate PDF documents from whole web pages - as long as you can grab a source of HTML, for example by using the PHP function file_get_contents, you can convert any accessible web page into PDF.

Convert HTML To Word

Although it’s a more archaic and less widely supported format, Microsoft Word documents remain a popular choice for saving/reading/printing documentation.

For this I’ll be using a different composer package, PHPWord. The approach is somewhat different than for generating PDFs.

First, install the package with composer.

composer require phpoffice/phpword

To get started, what’s happening in the following chunk of code is that I’m grabbing the HTML directly from the file sample.html and placing it within a DOMDocument object.

DOMDocument is a PHP class which allows for manipulation and extraction of data from HTML. Using this, it’s possible to search within HTML documents for specific pieces of data by attributes like id or class, or even by tag name - in much the same way that CSS selectors or Javascript DOM operations work.

Here, I’m getting a hold of the main page title, and the body content using the id attributes set within the HTML. You’ll see why shortly.


$data = file_get_contents('sample.html');
$dom = new DOMDocument();

// Now, extract the content I want to insert into my docx template

// 1 - The page title
$documentTitle = $dom->getElementById('title')->nodeValue;

// 2 - The article body content
$documentContent = $dom->getElementById('content')->nodeValue;

In the next step, I’m going to make use of an existing Word document to structure and template my generated document.

Now, unlike with DOMPDF, I can’t just take an HTML file and dump it straight into a Word document fully styled using PHPWord. It just doesn’t seem to work like that.

The approach I’m going to take is to use a template Word document, sample.docx and replace the title and content areas within that document with appropriate content from my HTML file (which I grabbed above using the getElementById method)

First, take a look at the file sample.docx in Word. You’ll see that it’s very sparse, with only a few bits of text, ${title}, ${author} and ${content}. These single words, surrounded by brackets and starting with a dollar symbol $ are the placeholders I’m going to use to swap out and replace with my HTML content.

PHPWord will be using this template document to construct my final document using the data I pass it.

Word template

The following lines are responsible for inserting that content into the Word template document.

// Load the template processor
$templateProcessor = new \PhpOffice\PhpWord\TemplateProcessor('template.docx');

// Swap out my variables for the HTML content
$templateProcessor->setValue('author', "Robin Metcalfe");
$templateProcessor->setValue('title', $documentTitle);
$templateProcessor->setValue('content', $documentContent);

Using this approach, you can create a Word template styled as you require, complete with formatting, font styles, spacing etc. - Then you can drop content straight into that template from an HTML file.

This is only a simple example, but you can use more complex methods to copy segments of the template, remove segments and more. Take a look at the PHPWord class definition file to see what methods are available.

Finally, I prepare my headers to download the generated file, and stream the data to the user’s browser.

header("Content-Description: File Transfer");
header('Content-Disposition: attachment; filename="generated.docx"');
header('Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document');
header('Content-Transfer-Encoding: binary');
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Expires: 0');
Generated PDF from DomPDF output
The result of running PHPWord on a chunk of HTML.

Note, though, the lack of any paragraph spacing or additional styling. You’d need to apply additional styling rules to the PHPWord object itself in order to more fully control the output of the script.

PHPWord by itself won’t parse any CSS included within the HTML file.

Try it yourself:

Generate & Download HTML » Word

If you’re looking for more examples on how to use PHPWord, I wouldn’t recommend the official documentation, it’s fairly sparse and still needs a lot of work.

Instead, take a look inside the /vendor directory after installing PHPWord using composer, specifically in phpoffice/phpword/samples where you’ll find a load of example files covering a range of use cases.

Convert HTML To Excel

One of the most useful conversions I’ve used before, is to produce Excel sheets using PHP, sometimes directly from HTML, but also straight from PHP using code.

In one instance, a client wanted to be able to download a spreadsheet of sales and performance metrics directly as an Excel sheet. No such functionality existed within the system, so I wrote some custom code for it using this technique.

Here’s a very quick example of how you can generate a simple spreadsheet using values provided in PHP.

Let’s get started. As before, I’ll install my dependencies using Composer:

composer require phpoffice/phpexcel

Now, for the content of my PHP file. This one is a fairly basic example, and the result is a few cells in a single sheet populated with numbers. Nothing too fancy.


 * Step 1: Setup

$objPHPExcel = new PHPExcel();

$objPHPExcel->getProperties()->setCreator("Robin Metcalfe")
                             ->setLastModifiedBy("Robin Metcalfe")
                             ->setTitle("Excel test")
                             ->setSubject("Solarise Design")
                             ->setDescription("A test document for outputting an Excel file with some basic values.")
                             ->setKeywords("office PHPExcel php")
                             ->setCategory("Test result file");

$sheet = $objPHPExcel->setActiveSheetIndex(0);

 * Step 2: Setting the values

// row 1
$sheet->setCellValue("A1", 'Column A');
$sheet->setCellValue("B1", 'Column B');

// row 2
$sheet->setCellValue("A2", '1');
$sheet->setCellValue("B2", '2');

// row 3
$sheet->setCellValue("A3", '3');
$sheet->setCellValue("B3", '4');

 * Step 3: Output
header('Content-Type: application/');
header('Content-Disposition: attachment;filename="html-to-excel.xls"');
header('Cache-Control: max-age=0');
// If you're serving to IE 9, then the following may be needed
header('Cache-Control: max-age=1');

// If you're serving to IE over SSL, then the following may be needed
header ('Expires: Mon, 26 Jul 1997 05:00:00 GMT'); // Date in the past
header ('Last-Modified: '.gmdate('D, d M Y H:i:s').' GMT'); // always modified
header ('Cache-Control: cache, must-revalidate'); // HTTP/1.1
header ('Pragma: public'); // HTTP/1.0

$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'Excel5');

Although containing quite a few lines of code, there’s only three significant things happening here:

  • Creation of a new PHPExcel object, and some configuration work to add title, creator etc.
  • Setting the values of the cells within the Excel sheet
  • Output of the generated file to the user’s browser, along with some useful headers

See the result yourself,

Generate & Download Simple Excel File

A more complex example

But, to expand on the above, let’s explore how I can take data from an HTML file and convert that into Excel format.

Here, I’m going to extract all visible tables within the HTML file, then use that data to create an Excel file containing two seperate sheets.

The effect of this will be that the script will locate all table data within a page, and convert it into an Excel file, with one sheet per table

Neat, huh?

Generated Excel from PHPExcel output
The result of running PHPExcel on a chunk of HTML

I’m also going to be making use of the PHP class DomDocument to extract the required data from my HTML file as I did before with HTML to Word

In the following chunk of code, I do the following:

  • First, grab the required data from my sample HTML file
  • Then I extract the data I want from the <table> element within the HTML file, looping through the rows contained in <tbody>, and grabbing the column headers from the <thead> element.
  • Next, I loop through the data generated in the previous step, and insert this into my PHPExcel object, which will build up the structure of the Excel file
  • Finally, I output the generated Excel file to the user’s browser.

PHPExcel offers a range of additional methods and options to control styling, formatting and much more. Take a look through the class documentation and the samples within the vendor/phpoffice/phpexcel/Examples directory to find out more.

Generate & Download HTML » Excel

Here’s the code in full:


// Pull in the HTML contents, and convert to XML
$data = file_get_contents('sample.html');

try {

    $dom = new DOMDocument();
    $dom->loadHTML( $data );

    // Get all tables in the document
    $tables = $dom->getElementsByTagName('table');

    // The array I'll store the table data in
    $tableData = array();

    foreach($tables as $tableN => $table) {
        // This requires properly formatted HTML table structure
        $head = $table->getElementsByTagName('thead')[0];
        $body = $table->getElementsByTagName('tbody')[0];

        // Table heading - assuming there is a heading directly before the table
        $tableData[] = array(
            'heading' => 'Table '.($tableN+1),
            'tableData' => array()

        if($head && $body) {

            foreach($head->getElementsByTagName('tr')[0]->getElementsByTagName('th') as $colN => $headCell) {
                $tableData[$tableN]['tableData']['headings'][] = $headCell->nodeValue;

            foreach($body->getElementsByTagName('tr') as $rowN => $tableRow) {
                foreach($tableRow->getElementsByTagName('td') as $colN => $tableCell) {
                    $tableData[$tableN]['tableData']['rows'][$rowN][$colN] = $tableCell->nodeValue;



} catch(\Exception $e) {
    // I failed...

// Instantiate the PHPExcel object
$objPHPExcel = new PHPExcel();

$objPHPExcel->getProperties()->setCreator("Robin Metcalfe")
                             ->setLastModifiedBy("Robin Metcalfe")
                             ->setTitle("HTML Tables To Excel Test")
                             ->setSubject("Solarise Design")
                             ->setDescription("A test document for converting HTML tables into Excel.")
                             ->setKeywords("office PHPExcel php")
                             ->setCategory("Test result file");

$alphabet = range('A', 'Z');

foreach($tableData as $tableN => $data) {

    if($tableN > 0) {

    $sheet = $objPHPExcel->setActiveSheetIndex($tableN);

    foreach($data['tableData']['headings'] as $n => $heading) {
        $sheet->setCellValue("{$alphabet[$n]}1", $heading);

    foreach($data['tableData']['rows'] as $rowN => $rowData) {
        foreach($rowData as $colN => $value) {
            $n = $rowN + 2;
            $sheet->setCellValue("{$alphabet[$colN]}{$n}", $value);



// Resize columns to fit data, just to tidy things up
foreach(range('A','Z') as $columnID) {

header('Content-Type: application/');
header('Content-Disposition: attachment;filename="html-to-excel.xls"');
header('Cache-Control: max-age=0');
// If you're serving to IE 9, then the following may be needed
header('Cache-Control: max-age=1');

// If you're serving to IE over SSL, then the following may be needed
header ('Expires: Mon, 26 Jul 1997 05:00:00 GMT'); // Date in the past
header ('Last-Modified: '.gmdate('D, d M Y H:i:s').' GMT'); // always modified
header ('Cache-Control: cache, must-revalidate'); // HTTP/1.1
header ('Pragma: public'); // HTTP/1.0

$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'Excel5');

Download Source Code

To get a copy of all files used within this article, download them here.

Once downloaded, you’ll need to run composer install to setup all the dependencies.