Tutorial - SimpleBrowserDotNet/SimpleBrowser GitHub Wiki

Tutorial SimpleBrowser

Navigate and inspect pages

Installing SimpleBrowser in your project

PM> Install-Package SimpleBrowser

Using SimpleBrowser to load a public page

The simplest use of SimpleBrowser is to download the contents of a known URL. In this case we take the English Wikipedia homepage.

var b = new Browser();
b.Navigate("http://en.wikipedia.org");

Console.WriteLine(b.Url);
// http://en.wikipedia.org/wiki/Main_Page

Console.WriteLine(b.CurrentHtml);
// <!DOCTYPE html>
// <html lang="en" dir="ltr" class="client-nojs">
// <head>
// etc..

Note that the URL is not exactly as we had requested it, because Wikipedia redirected us (status 301) to another URL. As SimpleBrowser acts as a visible (non-headless) browser would, it follows the redirect.

Configuring SimpleBrowser

Memory Usage

The SimpleBrowser default configuration is primarily meant for development, not production. Full logging is enabled. All history is stored indefinitely. As a result, continuous use of SimpleBrowser (without destroying SimpleBrowser or tabs in SimpleBrowser) typically causes high memory usage. These settings help reduce SimpleBrowser's memory footprint.

  1. RetainLogs: If you do not require logging, you may turn off logging. To turn off logging, set Browser.RetainLogs to false.
  2. (More detail to follow.)

Accessing specific content from a page

If we want to interact with the page, we typically want to select a specific part of the page. For example, by using the ID of an element. The Find() method allows for a number of different ways to search for elements in the page. The homepage of Wikipedia always contains a featured article for today, so let's select that information out:

var todaysFeaturedArticle = b.Find("div", FindBy.Id, "mp-tfa");
Console.WriteLine(todaysFeaturedArticle.Value);
// Full text from the element and it's children. No Markup.

In this case (but not always) the result represents a specific element from the page. You can get the textual value out as in the sample, but you can also interact with the element using the Click() method or the Checked property. The Value can also be set, which is especially appropriate when the result is an input box or text area. You can also access more detailed information using the XElement property. This will expose the XML structure of the element an allow you to navigate the details of the structure of the page.

Multiple elements

When multiple elements exist that conform to your specification, you can still use Find(). The return type, HtmlResult can also serve as a collection of elements. When you use te properties and methods described above, it will apply them on the first element found. But it also exposes properties like TotalElementsFound and implements IEnumerable. Let's loop through all links in the page:

var links = b.Find("a", new object { });
foreach (var link in links)
{
    Console.WriteLine("Found link with text '{0}' and title '{1}' to {2}", link.Value, link.GetAttribute("title"), link.GetAttribute("href"));
}
//Found link with text 'Sofia' and title 'Sofia' to /wiki/Sofia
//Found link with text 'Ottoman' and title 'Ottoman Empire' to /wiki/Ottoman_Empire
//Found link with text '1942' and title '1942' to /wiki/1942
//Found link with text 'World War II' and title 'World War II' to /wiki/World_War_II
//Found link with text 'Imperial Japanese Army' and title 'Imperial Japanese Army' to /wiki/Imperial_Japanese_Army
//Found link with text 'systematic extermination' and title 'Sook Ching' to /wiki/Sook_Ching
//Found link with text 'Chinese Singaporeans' and title 'Chinese Singaporean' to /wiki/Chinese_Singaporean
//...

Using Select

The Find() method offers a number of different ways to filter your elements (FindBy.Name, FindBy.Text, FindBy.PartialText, etc...). These methods were designed before jquery made CSS selectors the de facto query language inside HTML documents. To allow you to use this in SimpleBrowser as well, the Select() method was added. It takes a string as its single argument, but you should be able to express most of the queries you'll need with that. This is how we first loop over all links in the "Today's Featured Article" block and then click on the main articles link (which on Wikipedia is the first bold link).

var b = new Browser();
b.Navigate("http://en.wikipedia.org");
var links = b.Select("#mp-tfa a[href]"); // all links with a href inside #mp-tfa
foreach (var link in links)
{
    Console.WriteLine("Found link with text '{0}' and title '{1}' to {2}", link.Value, link.GetAttribute("title"), link.GetAttribute("href"));
}
var mainlink = b.Select("#mp-tfa b>a[href]");// all links with <a href> directly inside a <b> inside #mp-tfa
mainlink.Click();
Console.WriteLine("Url: {0}", b.Url);

// Found link with text 'SMS Bayern' and title 'SMS Bayern' to /wiki/SMS_Bayern
// Found link with text 'class' and title 'Ship class' to /wiki/Ship_class
// Found link with text 'battleships' and title 'Battleship' to /wiki/Battleship
// Found link with text 'German Imperial Navy' and title 'Kaiserliche Marine' to /wiki/Kaiserliche_Marine
// ...
// Url: http://en.wikipedia.org/wiki/SMS_Bayern

The Select() method can also be used in the scope of a single element. This allows you to search within a part of the page.

Submitting forms

Now that you have learned how to Find() elements, let's look at using those elements to submit forms. The process is to first find the form element, change the form element's value, then, once all form elements in the form have values to submit, submit the form. The following example searches Wikipedia from the form on the Wikipedia home page:

var b = new Browser();
b.Navigate("http://en.wikipedia.org");

// Find for the form element to change.
var searchInput = b.Find("searchInput");

// Optionally, you could do some error checking to see if you found what you were looking for
if(searchInput == null || searchInput.Exists == false)
{
    throw new Exception("Element not found");
}

// Assign the value to the form element.
searchInput.Value = "Mersenne twister";

// Submit the form
searchInput.SubmitForm();

Console.WriteLine(b.CurrentHtml);
// <!DOCTYPE html>
// <html lang="en" dir="ltr" class="client-nojs">
// <head>
// <meta charset="UTF-8" />
// <title>Mersenne twister - Wikipedia, the free encyclopedia</title>
// etc..

The Wikipedia search form is a very simple, and well-behaved example. There is only one form element that is easy to find. The above sample code is the equivalent of typing in the search box and pressing enter to submit the form. While completely acceptable to Wikipedia, some web sites insist that the search button be clicked. If this had been the case on Wikipedia, the code would look like this:

var b = new Browser();
b.Navigate("http://en.wikipedia.org");

// Find for the form element to change.
var searchInput = b.Find("searchInput");

// Assign the value to the form element.
searchInput.Value = "Mersenne twister";

// Find the search button
var searchButton = b.Find("searchButton");

// Click the search button
searchButton.Click();

Console.WriteLine(b.CurrentHtml);
// <!DOCTYPE html>
// <html lang="en" dir="ltr" class="client-nojs">
// <head>
// <meta charset="UTF-8" />
// <title>Mersenne twister - Wikipedia, the free encyclopedia</title>
// etc..

Working with complex forms

Let's look at a more complex form, with a combination of text boxes, radio buttons, check boxes, and selects. It will be helpful for the purposes of this tutorial and, in general, for working with forms to load the page in a browser and view the source code for the page. This is often the best and fastest way to know what the form looks actually like and how you will need to approach interacting with the form.

SimpleBrowser.Browser b = new SimpleBrowser.Browser();
b.Navigate("http://www.tizag.com/phpT/examples/formexample.php");

// Find a text input
var firstName = b.Find(ElementType.TextField, FindBy.Name, "Fname");
firstName.Value = "Michelangelina";

// Note: The HTML form had a maxlength attribute limiting the text to 12 characters. Therefore the value of the text input varies from what was assigned.
Console.WriteLine(firstName.Value);
// Michelangeli

// Find a radio button
var gender = b.Find(ElementType.RadioButton, FindBy.Value, "Female");
gender.Checked = true;

// This will also work to set the selected state of a radio button.
gender.Click();

// Find a check box
var food = b.Find("input", new { name = "food[]", value = "Pizza" });

// This will work to toggle the state of a check box ...
food.Click();

// ... but this will set it to a known value.
food.Checked = true;

// Find a textarea (note that a text input and textarea are both of type ElementType.TextField)
var quote = b.Find(ElementType.TextField, FindBy.Name, "quote");
quote.Value = "I love it when a plan comes together.";

// Find a select (drop-down box)
var education = b.Find(ElementType.SelectBox, FindBy.Name, "education");
education.Value = "College";

// Find a select (drop-down box)
var time = b.Find(ElementType.SelectBox, FindBy.Name, "TofD");
time.Value = "Day";

Every web developer codes their forms differently. When interacting with forms, you will need to familiarize yourself with the intricacies of the form and how the programmer created it. For example, many programmers use JavaScript to create drop down lists.

For example, a year drop down might render in a browser like this:

<select id="selectElementId">
    <option value="2015">2015</option>
    <option value="2016">2016</option>
    <option value="2017">2017</option>
    <option value="2018">2018</option>
    <option value="2019">2019</option>
</select>

When you view source on the page, however, it might just look like this:

<select id="selectElementId"></select>

Somewhere a chunk of JavaScript has created the option elements in the select. Since SimpleBrowser doesn't support JavaScript, the SimpleBrowser user is required to do the work of the JavaScript manually. For example:

// Create a new option element
System.Xml.Linq.XElement newOption = new System.Xml.Linq.XElement("option");
newOption.SetAttributeCI("value", "2018");

// Find a select (drop-down box)
var year = b.Find("selectElementId");

// Add the new option to the select
year.XElement.Add(newOption);

// Select the option just added
year.Value = "2018";

Note that you don't have to add all of the elements. You only have to add the element with the value you are selecting.

Working with Malformed HTML

You may be using SimpleBrowser to access public Internet sites. Not all sites are created equal. There are many web site that look beautiful, but are rendered by the browser from completely invalid HTML. All browsers do their best to handle malformed HTML. Occasionally, there are sites that are so poorly written that they will crash the browser. SimpleBrowser is no different from Chrome or FireFox in all these ways.

In cases where malformed HTML is causing a crash in SimpleBrowser, it is often necessary to modify the source HTML before the SimpleBrowser parser processes the HTML. The SimpleBrowser parser will not attempt to parse until the first call to Find() is made. Until that time, you have every opportunity to change the HTML to prevent a SimpleBrowser crash. For example, if there was a "badtoken" that caused the parser to crash, and all you needed to do to make it work was replace it with "goodtoken", just do that before you call Find():

var b = new Browser();
b.Navigate("http://www.example.com");

b.SetContent(b.CurrentHtml.Replace("badtoken", "goodtoken"));

b.Find("tokenId")

Login scenarios

TBD

Cookie based forms

TBD

Basic Authentication

Basic Authentication is part of the HTTP/1.0 specification. Basic Authentication uses HTTP headers to transmit user's credentials to the server. When the user requests a page secured by basic authentication, the first request is typically made without supplying credentials. The server receives and denies the request, returning a 401 (Unauthorized) HTTP response with the WWW-Authenticate header. A browser with a user interface would then typically pop-up a dialog box (i.e., a native browser-based dialog, NOT an HTML/JavaScript dialog box), prompting for a user name and password. After entering the username and password, the original request is resubmitted with credentials in their proper HTTP headers. Assuming the credentials are correct, the user is served the requested page.

The SimpleBrowser process is slightly different. Since SimpleBrowser has no user interface to be able to prompt for credentials. They must be supplied prior to sending the original request. Credentials are supplied to SimpleBrowser using the Browser.BasicAuthenticationLogin() method:

Browser b = new Browser();
BasicAuthenticationToken.Timeout = 20; // Optional. Default is 15 minutes, matching IE.
b.BasicAuthenticationLogin("example.com", "username", "password");
b.Navigate("http://www.example.com/");
b.BasicAuthenticationLogout("example.com"); // Optional.

Calling the Browser.BasicAuthenticationLogin() method adds the user's credentials to an in-memory cache for use in subsequent calls to pages protected with Basic Authentication. Whenever a request for the associated domain is made, the credentials are automatically sent in the proper HTTP headers. Use the BasicAuthenticationLogout() method to remove the user's credentials from the cache. If BasicAuthenticationLogout() is not called, the credentials will remain in the cache for a number of minutes equal to the BasicAuthenticationToken.Timeout value. By default, BasicAuthenticationToken.Timeout is set to 15 minutes. Every time a call is made a domain name in the cache, the timeout for that Basic Authentication Token for that domain name is refreshed. Therefore, so long as calls to the domain name are made more often than the length of the timeout, the token will never expire. Expired credentials will be automatically removed from the cache.

The SimpleBrowser behavior above is the same as Internet Explorer basic authentication token management. By comparison, Mozilla-based browsers cache the basic authentication tokens until the browser is closed rather than expiring them on a timer or the tokens are cleared in browser settings (assuming the browser offers the option to clear the cache). Leaving tokens in the cache is a potential security issue, should the internal memory cache become compromised. It is safer to remove them after a specified time of inactivity.

Note: BasicAuthenticationToken.Timeout is a global variable. Changing this value will affect all cached basic authentication tokens in all SimpleBrowser instances in the application.

Warning: When using basic authentication, the user's credentials are NOT encrypted. They are only base64 encoded. The safest way to secure basic authentication is to use SSL (HTTPS). That is, using basic authentication via unsecure HTTP is equivalent to sending the user's login credentials over the network in clear text. SimpleBrowser makes no attempt to block or restrict usage of Basic Authentication over unsecure HTTP. Unfortunately, this is really outside of the browser's control. If the server requires basic authentication, but does not allow the request to be sent via HTTPS, if you want the page badly enough, you don't have any choice but to be vulnerable.

Using HTTPS/SSL

TBD

Navigating backward and forward

TBD

Using multiple windows

TBD

Debugging

TBD

Inspect current state

TBD

Log requests

For certain debugging scenarios and for very specific tests, it can be useful to keep track of the details of the HTTP requests that were performed. For this purpose, SimpleBrowser offers a number of different patterns.

Last Request

Often, you'll just want to inspect some detail of the very last request made by the browser. The Browser instance exposes the method RequestData() for this goal. It returns a structure of type HttpRequestLog with information about the last request. This type contains the following properties:

public class HttpRequestLog : LogItem
{
    public string Text;
    public string Method;
    public NameValueCollection PostData;
    public string PostBody;
    public NameValueCollection QueryStringData;
    public WebHeaderCollection RequestHeaders;
    public WebHeaderCollection ResponseHeaders;
    public int ResponseCode;
    public Uri Url;
}

Log full history

A common case is to log all of the requests in a test to file when the test fails. To do this, you have to do two things:

  1. Make sure the property RetainLogs is set to true
  2. When your test has failed, call RenderHtmlLogFile(). This will return a string of HTML containing information about all requests in the history of this Browser instance.

The RequestLogged event

If you have more custom requirements, you can attach an event handler to the RequestLogged event. This will be raised after every request passing you a reference to the Browser instance and an HttpRequestLog. Use this to your very specific liking.

Browser b = new Browser();
b.RequestLogged += (br, l) =>
{
    // act on data in l
};

Logging errors

TBD

⚠️ **GitHub.com Fallback** ⚠️