Class HTMLDocument.HTMLReader

Enclosing class:
HTMLDocument
public class HTMLDocument.HTMLReader
extends HTMLEditorKit.ParserCallback

An HTML reader to load an HTML document with an HTML element structure. This is a set of callbacks from the parser, implemented to create a set of elements tagged with attributes. The parse builds up tokens (ElementSpec) that describe the element subtree desired, and burst it into the document under the protection of a write lock using the insert method on the document outer class.

The reader can be configured by registering actions (of type HTMLDocument.HTMLReader.TagAction) that describe how to handle the action. The idea behind the actions provided is that the most natural text editing operations can be provided if the element structure boils down to paragraphs with runs of some kind of style in them. Some things are more naturally specified structurally, so arbitrary structure should be allowed above the paragraphs, but will need to be edited with structural actions. The implication of this is that some of the HTML elements specified in the stream being parsed will be collapsed into attributes, and in some cases paragraphs will be synthesized. When HTML elements have been converted to attributes, the attribute key will be of type HTML.Tag, and the value will be of type AttributeSet so that no information is lost. This enables many of the existing actions to work so that the user can type input, hit the return key, backspace, delete, etc and have a reasonable result. Selections can be created, and attributes applied or removed, etc. With this in mind, the work done by the reader can be categorized into the following kinds of tasks:

Block
Build the structure like it's specified in the stream. This produces elements that contain other elements.
Paragraph
Like block except that it's expected that the element will be used with a paragraph view so a paragraph element won't need to be synthesized.
Character
Contribute the element as an attribute that will start and stop at arbitrary text locations. This will ultimately be mixed into a run of text, with all of the currently flattened HTML character elements.
Special
Produce an embedded graphical element.
Form
Produce an element that is like the embedded graphical element, except that it also has a component model associated with it.
Hidden
Create an element that is hidden from view when the document is being viewed read-only, and visible when the document is being edited. This is useful to keep the model from losing information, and used to store things like comments and unrecognized tags.

Currently, <APPLET>, <PARAM>, <MAP>, <AREA>, <LINK>, <SCRIPT> and <STYLE> are unsupported.

The assignment of the actions described is shown in the following table for the tags defined in HTML.Tag.

HTML tags and assigned actions
Tag Action
HTML.Tag.A CharacterAction
HTML.Tag.ADDRESS CharacterAction
HTML.Tag.APPLET HiddenAction
HTML.Tag.AREA AreaAction
HTML.Tag.B CharacterAction
HTML.Tag.BASE BaseAction
HTML.Tag.BASEFONT CharacterAction
HTML.Tag.BIG CharacterAction
HTML.Tag.BLOCKQUOTE BlockAction
HTML.Tag.BODY BlockAction
HTML.Tag.BR SpecialAction
HTML.Tag.CAPTION BlockAction
HTML.Tag.CENTER BlockAction
HTML.Tag.CITE CharacterAction
HTML.Tag.CODE CharacterAction
HTML.Tag.DD BlockAction
HTML.Tag.DFN CharacterAction
HTML.Tag.DIR BlockAction
HTML.Tag.DIV BlockAction
HTML.Tag.DL BlockAction
HTML.Tag.DT ParagraphAction
HTML.Tag.EM CharacterAction
HTML.Tag.FONT CharacterAction
HTML.Tag.FORM As of 1.4 a BlockAction
HTML.Tag.FRAME SpecialAction
HTML.Tag.FRAMESET BlockAction
HTML.Tag.H1 ParagraphAction
HTML.Tag.H2 ParagraphAction
HTML.Tag.H3 ParagraphAction
HTML.Tag.H4 ParagraphAction
HTML.Tag.H5 ParagraphAction
HTML.Tag.H6 ParagraphAction
HTML.Tag.HEAD HeadAction
HTML.Tag.HR SpecialAction
HTML.Tag.HTML BlockAction
HTML.Tag.I CharacterAction
HTML.Tag.IMG SpecialAction
HTML.Tag.INPUT FormAction
HTML.Tag.ISINDEX IsndexAction
HTML.Tag.KBD CharacterAction
HTML.Tag.LI BlockAction
HTML.Tag.LINK LinkAction
HTML.Tag.MAP MapAction
HTML.Tag.MENU BlockAction
HTML.Tag.META MetaAction
HTML.Tag.NOFRAMES BlockAction
HTML.Tag.OBJECT SpecialAction
HTML.Tag.OL BlockAction
HTML.Tag.OPTION FormAction
HTML.Tag.P ParagraphAction
HTML.Tag.PARAM HiddenAction
HTML.Tag.PRE PreAction
HTML.Tag.SAMP CharacterAction
HTML.Tag.SCRIPT HiddenAction
HTML.Tag.SELECT FormAction
HTML.Tag.SMALL CharacterAction
HTML.Tag.STRIKE CharacterAction
HTML.Tag.S CharacterAction
HTML.Tag.STRONG CharacterAction
HTML.Tag.STYLE StyleAction
HTML.Tag.SUB CharacterAction
HTML.Tag.SUP CharacterAction
HTML.Tag.TABLE BlockAction
HTML.Tag.TD BlockAction
HTML.Tag.TEXTAREA FormAction
HTML.Tag.TH BlockAction
HTML.Tag.TITLE TitleAction
HTML.Tag.TR BlockAction
HTML.Tag.TT CharacterAction
HTML.Tag.U CharacterAction
HTML.Tag.UL BlockAction
HTML.Tag.VAR CharacterAction

Once </html> is encountered, the Actions are no longer notified.

Nested Class Summary

Nested Classes
Modifier and Type Class Description
class  HTMLDocument.HTMLReader.BlockAction

Action assigned by default to handle the Block task of the reader.

class  HTMLDocument.HTMLReader.CharacterAction

Action assigned by default to handle the Character task of the reader.

class  HTMLDocument.HTMLReader.FormAction

Action to support forms by building all of the elements used to represent form controls.

class  HTMLDocument.HTMLReader.HiddenAction

Action assigned by default to handle the Hidden task of the reader.

class  HTMLDocument.HTMLReader.IsindexAction

Action assigned by default to handle the Isindex task of the reader.

class  HTMLDocument.HTMLReader.ParagraphAction

Action assigned by default to handle the Paragraph task of the reader.

class  HTMLDocument.HTMLReader.PreAction

Action assigned by default to handle the Pre block task of the reader.

class  HTMLDocument.HTMLReader.SpecialAction

Action assigned by default to handle the Special task of the reader.

class  HTMLDocument.HTMLReader.TagAction

An action to be performed in response to parsing a tag.

Field Summary

Fields
Modifier and Type Field Description
protected MutableAttributeSet charAttr

Current character attribute set.

protected Vector<DefaultStyledDocument.ElementSpec> parseBuffer

Buffer to keep building elements.

Fields declared in class javax.swing.text.html.HTMLEditorKit.ParserCallback

IMPLIED

Constructor Summary

Constructors
Constructor Description
HTMLReader​(int offset)

Constructs an HTMLReader using default pop and push depth and no tag to insert.

HTMLReader​(int offset, int popDepth, int pushDepth, HTML.Tag insertTag)

Constructs an HTMLReader.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type Method Description
protected void addContent​(char[] data, int offs, int length)

Adds some text with the current character attributes.

protected void addContent​(char[] data, int offs, int length, boolean generateImpliedPIfNecessary)

Adds some text with the current character attributes.

protected void addSpecialElement​(HTML.Tag t, MutableAttributeSet a)

Adds content that is basically specified entirely in the attribute set.

protected void blockClose​(HTML.Tag t)

Adds an instruction to the parse buffer to close out a block element of the given type.

protected void blockOpen​(HTML.Tag t, MutableAttributeSet attr)

Adds an instruction to the parse buffer to create a block element with the given attributes.

void flush()

The last method called on the reader.

void handleEndOfLineString​(String eol)

This is invoked after the stream has been parsed, but before flush.

void handleEndTag​(HTML.Tag t, int pos)

Callback from the parser.

void handleSimpleTag​(HTML.Tag t, MutableAttributeSet a, int pos)

Callback from the parser.

void handleStartTag​(HTML.Tag t, MutableAttributeSet a, int pos)

Callback from the parser.

void handleText​(char[] data, int pos)

Called by the parser to indicate a block of text was encountered.

protected void popCharacterStyle()

Pops a previously pushed character style off the stack to return to a previous style.

protected void preContent​(char[] data)

Adds the given content that was encountered in a PRE element.

protected void pushCharacterStyle()

Pushes the current character style on a stack in preparation for forming a new nested character style.

protected void registerTag​(HTML.Tag t, HTMLDocument.HTMLReader.TagAction a)

Registers a handler for the given tag.

protected void textAreaContent​(char[] data)

Adds the given content to the textarea document.

Methods declared in class javax.swing.text.html.HTMLEditorKit.ParserCallback

handleComment, handleError

Methods declared in class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

parseBuffer

protected Vector<DefaultStyledDocument.ElementSpec> parseBuffer

Buffer to keep building elements.

charAttr

protected MutableAttributeSet charAttr

Current character attribute set.

Constructor Detail

HTMLReader

public HTMLReader​(int offset)

Constructs an HTMLReader using default pop and push depth and no tag to insert.

Parameters:
offset - the starting offset

HTMLReader

public HTMLReader​(int offset,
                  int popDepth,
                  int pushDepth,
                  HTML.Tag insertTag)

Constructs an HTMLReader.

Parameters:
offset - the starting offset
popDepth - how many parents to ascend before insert new element
pushDepth - how many parents to descend (relative to popDepth) before inserting
insertTag - a tag to insert (may be null)

Method Detail

flush

public void flush()
           throws BadLocationException

The last method called on the reader. It allows any pending changes to be flushed into the document. Since this is currently loading synchronously, the entire set of changes are pushed in at this point.

Overrides:
flush in class HTMLEditorKit.ParserCallback
Throws:
BadLocationException - if the given position does not represent a valid location in the associated document.

handleText

public void handleText​(char[] data,
                       int pos)

Called by the parser to indicate a block of text was encountered.

Overrides:
handleText in class HTMLEditorKit.ParserCallback
Parameters:
data - a data
pos - a position

handleStartTag

public void handleStartTag​(HTML.Tag t,
                           MutableAttributeSet a,
                           int pos)

Callback from the parser. Route to the appropriate handler for the tag.

Overrides:
handleStartTag in class HTMLEditorKit.ParserCallback
Parameters:
t - an HTML tag
a - a set of attributes
pos - a position

handleEndTag

public void handleEndTag​(HTML.Tag t,
                         int pos)

Callback from the parser. Route to the appropriate handler for the tag.

Overrides:
handleEndTag in class HTMLEditorKit.ParserCallback
Parameters:
t - an HTML tag
pos - a position

handleSimpleTag

public void handleSimpleTag​(HTML.Tag t,
                            MutableAttributeSet a,
                            int pos)

Callback from the parser. Route to the appropriate handler for the tag.

Overrides:
handleSimpleTag in class HTMLEditorKit.ParserCallback
Parameters:
t - an HTML tag
a - a set of attributes
pos - a position

handleEndOfLineString

public void handleEndOfLineString​(String eol)

This is invoked after the stream has been parsed, but before flush. eol will be one of \n, \r or \r\n, which ever is encountered the most in parsing the stream.

Overrides:
handleEndOfLineString in class HTMLEditorKit.ParserCallback
Parameters:
eol - value of eol
Since:
1.3

registerTag

protected void registerTag​(HTML.Tag t,
                           HTMLDocument.HTMLReader.TagAction a)

Registers a handler for the given tag. By default all of the well-known tags will have been registered. This can be used to change the handling of a particular tag or to add support for custom tags.

Parameters:
t - an HTML tag
a - tag action handler

pushCharacterStyle

protected void pushCharacterStyle()

Pushes the current character style on a stack in preparation for forming a new nested character style.

popCharacterStyle

protected void popCharacterStyle()

Pops a previously pushed character style off the stack to return to a previous style.

textAreaContent

protected void textAreaContent​(char[] data)

Adds the given content to the textarea document. This method gets called when we are in a textarea context. Therefore all text that is seen belongs to the text area and is hence added to the TextAreaDocument associated with the text area.

Parameters:
data - the given content

preContent

protected void preContent​(char[] data)

Adds the given content that was encountered in a PRE element. This synthesizes lines to hold the runs of text, and makes calls to addContent to actually add the text.

Parameters:
data - the given content

blockOpen

protected void blockOpen​(HTML.Tag t,
                         MutableAttributeSet attr)

Adds an instruction to the parse buffer to create a block element with the given attributes.

Parameters:
t - an HTML tag
attr - the attribute set

blockClose

protected void blockClose​(HTML.Tag t)

Adds an instruction to the parse buffer to close out a block element of the given type.

Parameters:
t - the HTML tag

addContent

protected void addContent​(char[] data,
                          int offs,
                          int length)

Adds some text with the current character attributes.

Parameters:
data - the content to add
offs - the initial offset
length - the length

addContent

protected void addContent​(char[] data,
                          int offs,
                          int length,
                          boolean generateImpliedPIfNecessary)

Adds some text with the current character attributes.

Parameters:
data - the content to add
offs - the initial offset
length - the length
generateImpliedPIfNecessary - whether to generate implied paragraphs

addSpecialElement

protected void addSpecialElement​(HTML.Tag t,
                                 MutableAttributeSet a)

Adds content that is basically specified entirely in the attribute set.

Parameters:
t - an HTML tag
a - the attribute set