- HtmlUnit is an open source java library for creating HTTP calls which imitate the browser functionality.
- HtmlUnit is mostly used for integration testing upon Unit test frameworks such as JUnit or TestNG. This is done by requesting web pages and asserting the results.
Simple Example
@Test
public void testGoogle(){
WebClient webClient = new WebClient();
HtmlPage currentPage = webClient.getPage("http://www.google.com/");
assertEquals("Google", currentPage.getTitleText());
}
WebClient
- As you can see in the example, the WebClient is the starting point. It is the browser simulator.
- WebClient.getPage() is just like typing an address in the browser. It returns an HtmlPage object.
HtmlPage
- HtmlPage represents a single web page along with all of it’s client’s data (HTML, JavaScript, CSS …).
- The HtmlPage lets you access to many of a web page content:
Page source
- You can receive the page source as text or as XML.
HtmlPage currentPage =
webClient.getPage("http://www.google.com/");
String textSource = currentPage.asText();
String xmlSource = currentPage.asXml();
HTML Elements
- HtmlPage lets you ability to access any of the page HTML elements and all of their attributes and sub elements. This includes tables, images, input fields, divs or any other Html element you may imagine.
- Use the function getHtmlElementById() to get any of the page elements.
WebClient webClient = new WebClient();
HtmlPage currentPage = webClient.getPage("http://www.google.com/");
HtmlImage imgElement = (HtmlImage)currentPage.getHtmlElementById("logo");
System.out.println(imgElement.getAttribute("src"));
Anchors
- Anchor is the representation of the Html tag <a href=”…” >link</a>.
- Use the functions getAnchorByName(), getAnchorByHref() and getAnchorByText() to easily access any of the anchors in the page.
WebClient webClient = new WebClient();
HtmlPage currentPage = webClient.getPage("http://www.google.com/");
HtmlAnchor advancedSearchAn =
currentPage.getAnchorByText("Advanced Search");
currentPage = advancedSearchAn.click();
assertEquals("Google Advanced Search",currentPage.getTitleText());
Dom elements by XPath
- You can access any of the page elements by using XPath.
WebClient webClient = new WebClient();
HtmlPage currentPage =
webClient.getPage("http://www.google.com/search?q=avi");
//Using XPath to get the first result in Google query
HtmlElement element = (HtmlElement)currentPage.getByXPath("//h3").get(0);
DomNode result = element.getChildNodes().get(0);
Form control
- A large part of controlling your HTML page is to control the form elements:
- HtmlForm
- HtmlTextInput
- HtmlSubmitInput
- HtmlCheckBoxInput
- HtmlHiddenInput
- HtmlPasswordInput
- HtmlRadioButtonInput
- HtmlFileInput
WebClient webClient = new WebClient();
HtmlPage currentPage = webClient.getPage("http://www.google.com/");
//Get the query input text
HtmlInput queryInput = currentPage.getElementByName("q");
queryInput.setValueAttribute("aviyehuda");
//Submit the form by pressing the submit button
HtmlSubmitInput submitBtn = currentPage.getElementByName("btnG");
currentPage = submitBtn.click();
Tables
currentPage = webClient.getPage("http://www.google.com/search?q=htmlunit");
final HtmlTable table = currentPage.getHtmlElementById("nav");
for (final HtmlTableRow row : table.getRows()) {
System.out.println("Found row");
for (final HtmlTableCell cell : row.getCells()) {
System.out.println(" Found cell: " + cell.asText());
}
}
JavaScript support
- HtmlUnit uses the Mozilla Rhino JavaScript engine.
- This lets you the ability to run pages with JavaScript or even run JavaScript code by command.
ScriptResult result = currentPage.executeJavaScript(JavaScriptCode);
- By default JavaScript exceptions will crash your tests. If you wish to ignore JavaScript exceptions use this:
webClient().setThrowExceptionOnScriptError(false);
- If you would like to turn off the JavaScript all together, use this:
currentPage.getWebClient().setJavaScriptEnabled(false);
HTTP elements
URL
WebClient webClient = new WebClient();
HtmlPage currentPage =
webClient.getPage("http://www.google.co.uk/search?q=htmlunit");
URL url = currentPage.getWebResponse().getRequestSettings().getUrl()
Response status
WebClient webClient = new WebClient();
HtmlPage currentPage = webClient.getPage("http://www.google.com/");
assertEquals(200,currentPage.getWebResponse().getStatusCode());
assertEquals("OK",currentPage.getWebResponse().getStatusMessage());
Cookies
Set<Cookie> cookies = webClient.getCookieManager().getCookies();
for (Cookie cookie : cookies) {
System.out.println(cookie.getName() + " = " + cookie.getValue());
}
Response headers
WebClient webClient = new WebClient();
HtmlPage currentPage =
webClient.getPage("http://www.google.com/search?q=htmlunit");
List<NameValuePair> headers =
currentPage.getWebResponse().getResponseHeaders();
for (NameValuePair header : headers) {
System.out.println(header.getName() + " = " + header.getValue());
}
Request parameters
List<NameValuePair> parameters =
currentPage.getWebResponse().getRequestSettings().getRequestParameters();
for (NameValuePair parameter : parameters) {
System.out.println(parameter.getName() + " = " + parameter.getValue());
}
Making assertions
- HtmlUnit comes with a set of assetions:
assertTitleEquals(HtmlPage, String)
assertTitleContains(HtmlPage, String)
assertTitleMatches(HtmlPage, String)
assertElementPresent(HtmlPage, String)
assertElementPresentByXPath(HtmlPage, String)
assertElementNotPresent(HtmlPage, String)
assertElementNotPresentByXPath(HtmlPage, String)
assertTextPresent(HtmlPage, String)
assertTextPresentInElement(HtmlPage, String, String)
assertTextNotPresent(HtmlPage, String)
assertTextNotPresentInElement(HtmlPage, String, String)
assertLinkPresent(HtmlPage, String)
assertLinkNotPresent(HtmlPage, String)
assertLinkPresentWithText(HtmlPage, String)
assertLinkNotPresentWithText(HtmlPage, String)
assertFormPresent(HtmlPage, String)
assertFormNotPresent(HtmlPage, String)
assertInputPresent(HtmlPage, String)
assertInputNotPresent(HtmlPage, String)
assertInputContainsValue(HtmlPage, String, String)
assertInputDoesNotContainValue(HtmlPage, String, String)
- You can still of course use the framework’s assertions. For example, if you are using JUnit, you can still use assertTrue() and so on.
- Here are a few examples:
WebClient webClient = new WebClient();
HtmlPage currentPage =
webClient.getPage("http://www.google.com/search?q=htmlunit");
assertEquals(200,currentPage.getWebResponse().getStatusCode());
assertEquals("OK",currentPage.getWebResponse().getStatusMessage());
WebAssert.assertTextPresent(currentPage, "htmlunit");
WebAssert.assertTitleContains(currentPage, "htmlunit");
WebAssert.assertLinkPresentWithText(currentPage, "Advanced search");
assertTrue(currentPage.getByXPath("//h3").size()>0); //result number
assertNotNull(webClient.getCookieManager().getCookie("NID"));
See also
Download example
When ever I try to do
(HtmlImage)currentPage.getHtmlElementById(“logo”);
It keeps saying I can’t cast Htmldivision to htmlimage. PLEASE HELP!
That’s because the html element with id “logo” is probably not an image but rather a “div” element.
The examples I have shown in the post are pretty old so if you are trying them on google page they may not work.
Google is all together a problematic site for HtmlUnit for some reason.
Try this code:
WebClient webClient = new WebClient();
HtmlPage currentPage = webClient.getPage(“http://www.google.com/search?q=a”);
HtmlAnchor element = (HtmlAnchor)currentPage.getHtmlElementById(“logo”);
HtmlImage imgElement = (HtmlImage)element.getChildNodes().get(1);
System.out.println(imgElement.getAttribute(“src”));
Hey this page helped me a lot !
Thanks for your efforts !
Avi, I just wanted to thank you for a great, concise summary of how to get going with HtmlUnit.
I really appreciate the time you put in here – it took me much less time to get up to speed than with the HtmlUnit homepage, ironically!
🙂
Hi,
Well. I’m trying to code an app that makes a post at a web site using captcha. This app is for android. I was using Jsoup, but this framework Does’t support click events, and I need to click a submit button.
Do you know how Can I use HTMLUnit with android?
thanks.
I am new to Java, been searching everywhere for code to extract sections of text from a website, this looks good, how do I specify a specific detail. I.E stock price and how do I display in on the screen?
Many thanks
———————————————————————————————–
WebClient webClient = new WebClient();
2 HtmlPage currentPage = webClient.getPage(“http://www.google.com/”);
3 HtmlImage imgElement = (HtmlImage)currentPage.getHtmlElementById(“logo”);
4 System.out.println(imgElement.getAttribute(“src”));
This isn’t doing anything, am I doing something wrong?
1 @Test
2 public void testGoogle(){
3 WebClient webClient = new WebClient();
4 HtmlPage currentPage = webClient.getPage(“http://www.google.com/”);
5 assertEquals(“Google”, currentPage.getTitleText());
6 }
This might be helpful to start with…… http://www.bridgei2i.com/blog/extracting-data-from-webpages-in-java-with-help-of-htmlunit/
TO Saddam: thank you, very usefull. There is one thing though with the googleRes example, the method setValueAttribute(…) it is not available in the context of form.getInputsByName(“q”).setValueAttribute(…).
I’m using htmlunit-2.19.jar
Did you use other version?