HTML to MS Word Document conversion is primarily used in web applications to generate a.doc/.docx file with dynamic HTML content. The most common file type for exporting dynamic content for offline consumption is a Microsoft Word document. The functionality of exporting HTML information to MS Word can be readily accomplished using JavaScript. However, if you wish to convert dynamic material to Doc, you’ll need server-side interaction.
The server-side export to Word capability is extremely handy for converting dynamic HTML material to a Microsoft Word document and downloading it as a.docx file. Using PHP, an MS Word document with HTML content may be simply made. In this tutorial, we’ll teach you how to use PHP to convert HTML to a Microsoft Word document.
The HTML To Word (DOC/DOCX) Library is a collection of HTML to Word (DOC/DOCX) converters
The HTML TO DOC class is a PHP-based custom library that allows you to create MS Word documents and include HTML-formatted material in them.
setDocFileName() – Change the name of the document file.
setTitle() – Changes the title of the document.
getHeader() – Creates the document’s header section.
getFotter() – Creates the document’s footer section.
_parseHtml() – Parse and filter HTML from source. createDoc() – Create a word document in.dcox format.
insert content into word file() – Insert content into a word file.
<?php
/**
* Convert HTML to MS Word document
* @name HTML_TO_DOC
* @version 2.0
* @author CodexWorld
* @link https://www.codexworld.com
*/
class HTML_TO_DOC
{
var $docFile = '';
var $title = '';
var $htmlHead = '';
var $htmlBody = '';
/**
* Constructor
*
* @return void
*/
function __construct(){
$this->title = '';
$this->htmlHead = '';
$this->htmlBody = '';
}
/**
* Set the document file name
*
* @param String $docfile
*/
function setDocFileName($docfile){
$this->docFile = $docfile;
if(!preg_match("/\.doc$/i",$this->docFile) && !preg_match("/\.docx$/i",$this->docFile)){
$this->docFile .= '.doc';
}
return;
}
/**
* Set the document title
*
* @param String $title
*/
function setTitle($title){
$this->title = $title;
}
/**
* Return header of MS Doc
*
* @return String
*/
function getHeader(){
$return = <<<EOH
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
<meta name=ProgId content=Word.Document>
<meta name=Generator content="Microsoft Word 9">
<meta name=Originator content="Microsoft Word 9">
<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<![endif]-->
<title>$this->title</title>
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Print</w:View>
<w:DoNotHyphenateCaps/>
<w:PunctuationKerning/>
<w:DrawingGridHorizontalSpacing>9.35 pt</w:DrawingGridHorizontalSpacing>
<w:DrawingGridVerticalSpacing>9.35 pt</w:DrawingGridVerticalSpacing>
</w:WordDocument>
</xml><![endif]-->
<style>
<!--
/* Font Definitions */
@font-face
{font-family:Verdana;
panose-1:2 11 6 4 3 5 4 4 2 4;
mso-font-charset:0;
mso-generic-font-family:swiss;
mso-font-pitch:variable;
mso-font-signature:536871559 0 0 0 415 0;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{mso-style-parent:"";
margin:0in;
margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:7.5pt;
mso-bidi-font-size:8.0pt;
font-family:"Verdana";
mso-fareast-font-family:"Verdana";}
p.small
{mso-style-parent:"";
margin:0in;
margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:1.0pt;
mso-bidi-font-size:1.0pt;
font-family:"Verdana";
mso-fareast-font-family:"Verdana";}
@page Section1
{size:8.5in 11.0in;
margin:1.0in 1.25in 1.0in 1.25in;
mso-header-margin:.5in;
mso-footer-margin:.5in;
mso-paper-source:0;}
div.Section1
{page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1032">
<o:colormenu v:ext="edit" strokecolor="none"/>
</o:shapedefaults></xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1"/>
</o:shapelayout></xml><![endif]-->
$this->htmlHead
</head>
<body>
EOH;
return $return;
}
/**
* Return Document footer
*
* @return String
*/
function getFotter(){
return "</body></html>";
}
/**
* Create The MS Word Document from given HTML
*
* @param String $html :: HTML Content or HTML File Name like path/to/html/file.html
* @param String $file :: Document File Name
* @param Boolean $download :: Wheather to download the file or save the file
* @return boolean
*/
function createDoc($html, $file, $download = false){
if(is_file($html)){
$html = @file_get_contents($html);
}
$this->_parseHtml($html);
$this->setDocFileName($file);
$doc = $this->getHeader();
$doc .= $this->htmlBody;
$doc .= $this->getFotter();
if($download){
@header("Cache-Control: ");// leave blank to avoid IE errors
@header("Pragma: ");// leave blank to avoid IE errors
@header("Content-type: application/octet-stream");
@header("Content-Disposition: attachment; filename=\"$this->docFile\"");
echo $doc;
return true;
}else {
return $this->write_file($this->docFile, $doc);
}
}
/**
* Parse the html and remove <head></head> part if present into html
*
* @param String $html
* @return void
* @access Private
*/
function _parseHtml($html){
$html = preg_replace("/<!DOCTYPE((.|\n)*?)>/ims", "", $html);
$html = preg_replace("/<script((.|\n)*?)>((.|\n)*?)<\/script>/ims", "", $html);
preg_match("/<head>((.|\n)*?)<\/head>/ims", $html, $matches);
$head = !empty($matches[1])?$matches[1]:'';
preg_match("/<title>((.|\n)*?)<\/title>/ims", $head, $matches);
$this->title = !empty($matches[1])?$matches[1]:'';
$html = preg_replace("/<head>((.|\n)*?)<\/head>/ims", "", $html);
$head = preg_replace("/<title>((.|\n)*?)<\/title>/ims", "", $head);
$head = preg_replace("/<\/?head>/ims", "", $head);
$html = preg_replace("/<\/?body((.|\n)*?)>/ims", "", $html);
$this->htmlHead = $head;
$this->htmlBody = $html;
return;
}
/**
* Write the content in the file
*
* @param String $file :: File name to be save
* @param String $content :: Content to be write
* @param [Optional] String $mode :: Write Mode
* @return void
* @access boolean True on success else false
*/
function write_file($file, $content, $mode = "w"){
$fp = @fopen($file, $mode);
if(!is_resource($fp)){
return false;
}
fwrite($fp, $content);
fclose($fp);
return true;
}
}
HTML to Word Document Conversion
Using the HTML TO DOC class, the following sample code converts HTML text to an MS Word document and saves it as a.docx file.
1. Initialize and load the HTML TO DOC class.
// Load library
include_once 'HtmlToDoc.class.php';
// Initialize class
$htd = new HTML_TO_DOC();
Choose the HTML text you want to convert.
$htmlContent = '
<h1>Hello World!</h1>
<p>This document is created from HTML.</p>';
To convert HTML to a Word document, use the createDoc() function.
$htmlContent is the variable that holds the HTML content.
To save the word file, type the name of the document (my-document).
$htd->createDoc($htmlContent, "my-document");
Download the following word document:
Set the third parameter of the createDoc() function to TRUE to download the word file.
$htd->createDoc($htmlContent, "my-document", 1);
From an HTML file, create a Word document
By giving the HTML file name, you can convert the HTML file content to a Word document.
$htd->createDoc("source.html", "my-document");
Format of a Word Document
If the file name supplied to the createDoc() method does not include an extension, it will be stored as a.doc file format by default. To save a word document in the.docx format, add an extension to the file name.
Note :
For HTML to Word conversion, there are a number of third-party libraries available. However, without the use of an external library, you may convert HTML text to a Word document using PHP. The HTML TO DOC class in PHP makes it simple to convert dynamic HTML content to Word documents and save/download them as.docx files. You can simply customise the HTML To Doc class to meet your own requirements.