12 February 2014

Google Search Appliance 7.2 New Features

Google announced the release of Google Search Appliance 7.2 today! Lots of point releases tend to be mostly bug fixes, but this is a major release - with a number of powerful new features! I've already had GSA customers ask me whether they should upgrade, so I thought it would be useful to describe some of the new features in detail.


New Version Manager

Since you have to download and install the new version, the first thing you'll notice is the new Version Manager. The process of applying upgrades isn't too exciting even on the GSA, but there are some nice new features here. For example, you can actually roll back to previous versions! This is handy for developers who have to support customers on 7.0 and 7.2 simultaneously. You can also easily open a SupportCall session from Version Manager, which is nice if you can't get into the admin console.

Version Manager showing multiple available versions

New admin console look and feel

Once you've upgraded, the first thing you'll notice is that the admin console has been significantly changed. The UI is a lot cleaner, and menus have been rearranged - it may take you a while to learn where everything is in the new menus! The overall structure of the admin console hasn't changed that much, though, so with a little browsing you'll be able to find everything you need. The notification area in the upper right corner is a nice touch - warning messages will show up there and can then be dismissed.

New admin console

Wildcard searching

The biggest new feature, in my opinion, is wildcard searching. If you're not familiar with the GSA, it might be hard for you to believe that this feature wasn't already there, but it wasn't! Google Search Appliance searches have been a lot like searches on Google.com: the search user enters a few words or a phrase, and the back-end search algorithms do all of the heavy lifting to figure out what's really relevant for the user. In the enterprise though, wildcard searches can be a little more important than for public searches.

Wildcard searches are pretty simple. To use them, you first have to configure the index to support them. Within the 7.2 admin console, navigate to Index ... Index Settings, set "Wildcard Indexing Type" to "Complete", and save your changes. You'll need to recrawl or refeed your documents after doing this.

Wildcard Indexing Settings
Once you've done that, you're ready to run some wildcard searches! There are two wildcards: the asterisk (*) and the question mark (?). Just like in SQL, the asterisk matches zero or more characters, and the question mark matches exactly one character. You can simply type in your search, by default. The GSA will identify that you're using a wildcard, and will insert a query operator in front of your search term. For example, if you searched for

salm*

the GSA would convert your query to

wildcard:salm*

Sample wildcard query
Pretty neat, huh! Don't worry, though - we're not going all the way down the road to a full query language like many specialized search systems (ex: LexisNexis, Westlaw, etc) offer. GSA searches will remain very easy for the search user, who isn't expected to learn a special language just to search.

Metadata sorting

Another great new feature in 7.2 is metadata sorting. Previous versions of the GSA are able to sort results on two values: relevance and date. By default, results are sorted by relevance. One requirement many of our customers have had was the ability to sort on other fields, and the GSA simply couldn't do this by itself. To satisfy that requirement, we'd build a custom application that would fetch up to one thousand records from the GSA, build a recordset in memory, then provide sort capabilities for that recordset. This is a fairly significant amount of work if you just want to be able to sort by another field!

To sort by metadata, you simply modify the "sort" URL parameter to include the name of the metadata field you want to use. In this release, you can only sort by one field, but I think it's pretty rare to want to sort search results by more than one field at one time - at least, our customers haven't expressed a desire to do that even with the custom sorting functionality we've built in the past. 

Here's an example of a search URL that gets a single metadata field "author". I'm using a customized XSLT stylesheet to display the metadata field:

/search?q=inmeta:author&btnG=Google+Search&access=p&client=show_meta&output=xml_no_dtd&proxystylesheet=show_meta&sort=date:D:L:d1&wc=200&wc_mc=1&oe=UTF-8&ie=UTF-8&ud=1&getfields=author&exclude_apps=1&site=default_collection&ulang=en&ip=192.168.140.64&entqr=3&entqrm=0&filter=0

And here's the search results display:

Search results sorted by relevance
Now, let's sort them by author. We'll need to change the search URL - I've highlighted the sort parameter in red:

/search?q=inmeta:author&btnG=Google+Search&access=p&client=show_meta&output=xml_no_dtd&proxystylesheet=show_meta&sort=meta:Author&wc=200&wc_mc=1&oe=UTF-8&ie=UTF-8&ud=1&getfields=author&exclude_apps=1&site=default_collection&ulang=en&ip=192.168.140.64&entqr=3&entqrm=0&filter=0

Here's the sorted results - you'll have to look carefully to compare the author values:

Search results sorted by metadata

Support for HTTP POST

The third and last new feature I'll discuss today is support for HTTP POST. Until now, the only way to send a search query to the GSA was via HTTP GET, which basically means building a URL containing all of the query parameters you want to send. These search URLs can get quite long and unwieldy, and there are some length limitations you might run into. Using HTTP POST avoids these limitations, and allows you to build much longer query parameter values. There isn't much to show here - if you're building a search form, for example, you'd simply use METHOD="POST" instead of METHOD="GET" in your HTML FORM tag. This is not something I see myself using that often, but it's nice to have just in case.

Trusted Applications

The last new feature I'll cover here is support for trusted applications. This is actually one of the most exciting features in my opinion - I do a decent amount of application development with the GSA, and handling security credentials has occasionally been problematic. It's always nicer to be able to have your application act as a security proxy for the GSA, and have it just tell the GSA who the user is, rather than having to pass the user to a GSA authentication process where the GSA needs to validate the user's credentials.

The idea here is very simple. First, you need to have a couple of things. You need to have content that is secured using either HTTP Basic or Forms Authentication. You also need to have ACLs for that content within the GSA to allow early binding - in other words, the GSA has to be able to authorize access to documents without using the default HEADREQUEST process. Then, configure your GSA to allow trusted applications. This involves enabling the feature itself, then defining one or more trusted users and/or groups - you're more likely to use groups here, of course. When defining a group, you also specify a credential group that it should be associated with, and optionally the domain that should be used if one is required. For example, if you're using Basic Authentication using domain accounts against an IIS server that's a member of a Windows domain, you may need to specify the domain.

Enabling trusted applications

Once that's done, your application will need to send two things when it makes HTTP requests to the GSA. The first thing it needs to send is the authentication token belonging to the user, which will either be an HTTP Basic Authorization header, or a cookie that contains the user's authentication token. Here's an example of an HTTP Basic Authorization request header:

Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==

The second thing your application needs to send is a custom X- header containing the username or groupname corresponding to the trusted user or group you set in the GSA:

X-GSA-USER: dwatts
X-GSA-CREDENTIAL-GROUP: Default

And that's it! Your application should then receive the secure search results filtered by ACL.

Conclusion

I've been waiting for this upgrade for a long time, and I'm very excited about it! But don't think that this is all GSA 7.2 has - there are plenty of additional features I haven't covered here. For example, there's a new connector framework, and a new SharePoint connector, and much more!

If you're interested in trying this on your GSA, you can log into the Google Enterprise Support Portal and download it today! GSA 7.2 documentation is also available now. Google is also running 7.2 webinars starting next week - check out the schedule at learngsa.com and sign up today! In addition to the 7.2 webinars, Google and Fig Leaf Software are also running some joint webinars on document dates, metadata and crawl tuning - you can sign up for them on learngsa.com if you're interested!

And of course, if you have any GSA questions, please feel free to send them to google@figleaf.com and we'll be happy to respond!

[Note: cross-posted on the Fig Leaf Software blog]




2 comments:

  1. Great article Dave,
    is there anyway to browse & try these features, there is trial virtual version of GSA, our a cloud server for evaluation purposes?

    ReplyDelete
    Replies
    1. No, there's no trial or virtual version of the GSA.

      Delete

All comments are subject to potentially unfair moderation. All comments are owned by the poster of said comments.