Simon Green's Developer Blog
Developing .NET in the cold white north ...

NHibernate.Search using Lucene.NET Full Text Index (Part 3)

Saturday, 26 April 2008 13:07 by simon

In Part 1 we looked at how to create a full-text index of NHibernate persisted domain objects using the Lucene.NET project. Part 2 then looked at how to query the index complete with query-parsing and hit-highlighting of the results.

Now that we have a full-text index there are other things that we can use it for. The easiest and most useful is probably adding a 'similar items' feature where the system can automatically display related items based on the text that they share in common. While it isn't exact the results are often surprisingly good and while a human editor could probably pick out some links with more finesse it can quickly become an impossible task as the number of items grow - the human will typically resort to searching for similar items using the index anyway so why not automate it?!

This feature can be used to display related web pages or blog entries or, in this case, related books. It probably isn't too far off from the system that Amazon uses. The benefit is that as new content is being added, the top related items can constantly be updated - even for existing items in the system. So, for example, if a new Harry Potter book is released then the existing books can immediately start linking to it and vice-versa or if a company starts offering a new training course or product then any related pages will immediately start to link together.

While it sounds complicated, it is actually quite easy thanks to the contrib assemblies provided with Lucene.NET. In fact, it's so simple it's almost trivial so this won't be a long post!

First, we need to add a new reference to the SimilarityNet.dll assembly (part of Lucene.NET contrib). This provides a SimilarityQueries class which contains a FormSimilarQuery method. Calling this will a piece of text (from an existing field), an analyzer and the field name will produce a boolean query using every unique word where all words are optional. If we repeat this with each field, boosting the relevance of the most important ones (such as title) then we end up with a query that will look for every word in each field of the original item.

To quote the Lucene documentation:

The philosophy behind this method is "two documents are similar if they share lots of words". Note that behind the scenes, Lucene's scoring algorithm will tend to give two documents a higher similarity score if the share more uncommon words.

What this means in practice is that the more unique a word is, the more likely it will be taken into account when ranking the similar items. So, if our original book has 'Agile' in the title and words such as 'scrum' and 'backlog' in the summary then chances are we will find other books that also have these more unique words ... and it's very likely that they will be related to our original book.

Of course, when we search our index for books with all these words there is going to be one obvious match - the original book! In fact, this should be the first result returned so we could either skip this when creating the result-set (looking for the same unique Id rather than just skipping the first one just to be safe) or, as in the example below, use a boolean search and specifically exclude the Id of the source item from the query. I haven't experimented to see which one is quicker but I prefer to let Lucene do all the work - I trust it and it saves me writing any more code or getting results back that I am just going to discard which feels wrong.

Here is the code to find the best 4 similar matches to any book passed in. Note that I include the Authors and Publisher fields when doing the comparison so it will tend to favour books by the same author or publisher - you will need to experiment to see what makes most sense for your application and usage.

/// <summary>
/// Gets similar books.
/// </summary>
/// <param name="book">The book.</param>
/// <returns></returns>
public override IList<IBook> GetSimilarBooks(IBook book)
{
    IFullTextSession session = (IFullTextSession)NHibernateHelper.GetCurrentSession();
    Analyzer analyzer = new StandardAnalyzer();
    BooleanQuery query = new BooleanQuery();

    Query title = Similarity.Net.SimilarityQueries.FormSimilarQuery(book.Title, analyzer, "Title", null);
    title.SetBoost(10);
    query.Add(title, BooleanClause.Occur.SHOULD);

    if (book.Summary != null) {
        Query summary =
            Similarity.Net.SimilarityQueries.FormSimilarQuery(book.Summary, analyzer, "Summary", null);
        summary.SetBoost(5);
        query.Add(summary, BooleanClause.Occur.SHOULD);
    }

    if (book.Authors != null) {
        Query authors =
            Similarity.Net.SimilarityQueries.FormSimilarQuery(book.Authors, analyzer, "Authors", null);
        query.Add(authors, BooleanClause.Occur.SHOULD);
    }

    if (book.Publisher != null) {
        Query publisher =
            Similarity.Net.SimilarityQueries.FormSimilarQuery(book.Publisher, analyzer, "Publisher", null);
        query.Add(publisher, BooleanClause.Occur.SHOULD);
    }
    // avoid the book being similar to itself!
    query.Add(new TermQuery(new Term("Id", book.Id.ToString())), BooleanClause.Occur.MUST_NOT);

    IQuery nhQuery = session.CreateFullTextQuery(query, new Type[] { typeof(Book) })
                            .SetMaxResults(4);

    IList<IBook> books = nhQuery.List<IBook>();
    return books;
}

 

That about wraps it up for using NHibernate and Lucene. I'm expecting things to change when the new NHibernate version 2.0 is released so I'll probably post again to update you of any changes though when it is. Also, there are a few other features available in Lucene which I may blog about such as using Synonyms for the 'did you mean ...' type suggestions.

Please let me know if there is anything that I haven't explained particularly well or you would like to see more about.

Currently rated 5.0 by 2 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:   , ,
Categories:   .NET
Actions:   E-mail | del.icio.us | Permalink | Comments (0) | Comment RSSRSS comment feed

NHibernate.Search using Lucene.NET Full Text Index (Part 2)

Sunday, 30 March 2008 10:16 by simon

In NHibernate.Search using Lucene.NET Full Text Index (Part 1) we looked at setting up the NHibernate.Search extension to add full-text searching of -persisted objects.

Next, we'll look at how we can perform Google-like searches using the index and some tips on displaying the results including highlighting the search-terms.

Our Book class has the Title, Summary, Authors and Publisher field indexed so we'll allow searching in any of these fields. However, if a search-term exists in the title it is probably more relevant than if it just exists in the summary so we want to give more priority to certain fields than to others. Likewise, we probably want to be able to specify which fields to search on otherwise we would get books that make mention of "Martin Fowler" in the summary whereas we may want to only see books that have "Martin Fowler" as an author for example.

Also worth mentioning is the Summary field. In the Book class there is a SummaryHtml field which (you'll never guess) contains the Html summary retrieved from Amazon and also a Summary field which is the one that is actually indexed. In the full app this text field is generated from the Html content using the . The reason we want a version of the Summary in plain text is to make indexing easier / more accurate (no HTML tags) and also to allow result fragments to be created: imagine if a section of the SummaryHtml was output - it could potentially split across an Html element or attribute (producing invalid markup) or include the opening tag but not the matching closing one (producing runaway bold-text for instance).

Back to our example though. To be able to show the highlighted search terms in the results I found it easier to create a special BookSearchResult class that I can return from the data provider - the highlighting is something Lucene.NET can do for us and avoids us having to write our own presentation code to handle it. Here is the class:

/// <summary>
/// A wrapper for a book object returned from a full text index query
/// with additional properties for highlighted segments
/// </summary> public class BookSearchResult : IBookSearchResult { private readonly IBook _book; private string _highlightedTitle; private string _highlightedSummary; private string _highlightedAuthors; private string _highlightedPublisher; /// <summary> /// Initializes a new instance of the <see cref="BookSearchResult"/> class. /// </summary> /// <param name="book">The book.</param> public BookSearchResult(IBook book) { _book = book; } /// <summary> /// Gets the book. /// </summary> /// <value>The book.</value> public IBook Book { get { return _book; } } /// <summary> /// Gets or sets the highlighted title. /// </summary> /// <value>The highlighted title.</value> public string HighlightedTitle { get { if (_highlightedTitle == null || _highlightedTitle.Length == 0) { return _book.Title; } return _highlightedTitle; } set { _highlightedTitle = value; } } /// <summary> /// Gets or sets the highlighted summary. /// </summary> /// <value>The highlighted summary.</value> public string HighlightedSummary { get { if (_highlightedSummary == null || _highlightedSummary.Length == 0) { if (_book.Summary == null || _book.Summary.Length < 300) { return _book.Summary; } else { return _book.Summary.Substring(0,300) + " ..."; } } return _highlightedSummary; } set { _highlightedSummary = value; } } /// <summary> /// Gets or sets the highlighted authors. /// </summary> /// <value>The highlighted authors.</value> public string HighlightedAuthors { get { if (_highlightedAuthors == null || _highlightedAuthors.Length == 0) { return _book.Authors; } return _highlightedAuthors; } set { _highlightedAuthors = value; } } /// <summary> /// Gets or sets the highlighted publisher. /// </summary> /// <value>The highlighted publisher.</value> public string HighlightedPublisher { get { if (_highlightedPublisher == null || _highlightedPublisher.Length == 0) { return _book.Publisher; } return _highlightedPublisher; } set { _highlightedPublisher = value; } } }

 

You'll notice that the Highlighted... fields return the equivalent book field if the highlighted field does not exist. This just saves us having to check whether there is a highlighted term in each field when we're building the search result list.

Our data provider will accept a single string consisting of the entered search-terms and return a list of BookSearchResult objects that match. Here is the code and I'll then try and explain what it's doing:

/// <summary>
/// Finds the books.
/// </summary>
/// <param name="query">The query.</param>
/// <returns></returns>
public override IList<IBookSearchResult> FindBooks(string query)
{
    IList<IBookSearchResult> results = new List<IBookSearchResult>();

    Analyzer analyzer = new SimpleAnalyzer();
    MultiFieldQueryParser parser = new MultiFieldQueryParser(
new string[] { "Title", "Summary", "Authors", "Publisher"},
analyzer); Query queryObj; try { queryObj = parser.Parse(query); } catch (ParseException) { // TODO: provide feedback to user on failed search expressions return results; } IFullTextSession session = (IFullTextSession) NHibernateHelper.GetCurrentSession(); IQuery nhQuery = session.CreateFullTextQuery(queryObj, new Type[] {typeof (Book) } ); IList<IBook> books = nhQuery.List<IBook>(); IndexReader indexReader = IndexReader.Open(SearchFactory.GetSearchFactory(session)
.GetDirectoryProvider(typeof (Book)).Directory); Query simplifiedQuery = queryObj.Rewrite(indexReader); SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<b class='term'>", "</b>"); Highlighter hTitle = GetHighlighter(simplifiedQuery, formatter, "Title", 100); Highlighter hSummary = GetHighlighter(simplifiedQuery, formatter, "Summary", 200); Highlighter hAuthors = GetHighlighter(simplifiedQuery, formatter, "Authors", 100); Highlighter hPublisher = GetHighlighter(simplifiedQuery, formatter, "Publisher", 100); foreach(IBook book in books) { IBookSearchResult result = new BookSearchResult(book); TokenStream tsTitle = analyzer.TokenStream("Title",
new System.IO.StringReader(book.Title ?? string.Empty)); result.HighlightedTitle = hTitle.GetBestFragment(tsTitle, book.Title); TokenStream tsAuthors = analyzer.TokenStream("Authors",
new System.IO.StringReader(book.Authors ?? string.Empty)); result.HighlightedAuthors = hAuthors.GetBestFragment(tsAuthors, book.Authors); TokenStream tsPublisher = analyzer.TokenStream("Publisher",
new System.IO.StringReader(book.Publisher ?? string.Empty)); result.HighlightedPublisher = hPublisher.GetBestFragment(tsPublisher, book.Publisher); TokenStream tsSummary = analyzer.TokenStream("Summary",
new System.IO.StringReader(book.Summary ?? string.Empty)); result.HighlightedSummary = hSummary.GetBestFragments(tsSummary,
book.Summary, 3, " ... <br /><br /> ... "); results.Add(result); } return results; } /// <summary> /// Gets the highlighter for the given field. /// </summary> /// <param name="query">The query.</param> /// <param name="formatter">The formatter.</param> /// <param name="field">The field.</param> /// <param name="fragmentSize">Size of the fragment.</param> /// <returns></returns> private static Highlighter GetHighlighter(Query query, Formatter formatter,
string field, int fragmentSize) { // create a new query to contain the terms BooleanQuery termsQuery = new BooleanQuery(); // extract terms for this field only WeightedTerm[] terms = QueryTermExtractor.GetTerms(query, true, field); foreach (WeightedTerm term in terms) { // create new term query and add to list TermQuery termQuery = new TermQuery(new Term(field, term.GetTerm())); termsQuery.Add(termQuery, BooleanClause.Occur.SHOULD); } // create query scorer based on term queries (field specific) QueryScorer scorer = new QueryScorer(termsQuery); Highlighter highlighter = new Highlighter(formatter, scorer); highlighter.SetTextFragmenter(new SimpleFragmenter(fragmentSize)); return highlighter; }
 

First, we parse the user-entered query string indicating that we want to match on the fields Title, Summary, Authors and Publisher using the MultiFieldQueryParser. This turns the user entered search expression into Lucene specific instructions. Most users when searching will enter a simple expression containing the words or phrase that they want to find. If the search term "XML' is entered for example Lucene will convert this into the expression "Title:XML Summary:XML Authors:XML Publisher:XML" which effectively means "find any record where 'XML' exists in any of the fields".

The user can enter specific instructions directly such as "Title:Architecture Authors:Fowler" which means "Find any books that have 'Architecture' in the Title field or 'Fowler' in the Authors field". Boolean expressions can be used to control this further allowing "(Title:Architecture) AND (Authors:Fowler)" to find any books titled 'Architecture' authored by 'Fowler'. When specific searches like this have been entered then the MultiFieldQueryParser doesn't expand the search to include all fields (except for un-field-prefixed words and phrases).

Incidentally, in the original Book class we included attributes to control the indexing such as [Boost(10)] for the Title. This boosts the relevance of searches on certain fields so a search for 'XML' in the Title and Summary of a document will rank books with 'XML' in the Title higher than books that have 'XML' in the summary - they are more likely to be what the user is searching for in this case.

Lucene does provide many other ways to define a query but this is simple and easy for this example.

Once we have our Lucene query object we use this to create an NHibernate.Search full-text query to return Book objects. This is where NHibernate and Lucene meet (from a querying point of view). It is possible to combine full-text-queries of Lucene with NHibernate queries of the database - NHibernate.Search handles the searching and returns the relevant objects.

So, we now have a list of Book objects just the same as if it had come directly from NHibernate except that the results are in order based on the rank provided by the Lucene search.

Now, we'll use another part of Lucene to highlight the matches. This is done using the SimpleHTMLFormatter, QueryScorer and Highlighter objects which combined allow us to get a fragment for each field with the search terms highlighted.

Note that the SimpleHtmlFormatter class is not in the main Lucene.Net.dll assembly but instead in a separate contrib assembly called Highlighter.Net.dll - there are also some other interesting utilities worth exploring in the contrib folder of the Lucene.NET distribution. Remember in Part 1 I mentioned that I had problems with assembly references and different versions of Lucene.Net.dll being used by NHibernate.Search so if you have problems building the solution after adding references to these contrib assemblies, consider building NHibernate.Search making sure that it references the same Lucene.Net.dll as the Lucene contrib assemblies were built against.

The Highlighter object for each field has to be based on the query terms for that field only so the original query is re-written and split up so that only the terms searched for that field are used. This isn't strictly necessary but I think it makes more sense if when you search for 'Microsoft' in the Title of a book only that occurrences of 'Microsoft' in the Summary or Publisher fields are not highlighted: the highlighted results then show clearly which found terms influenced the results. I have split this functionality into a separate GetHighlighter() method.

For example, without doing this a search for 'Title:Microsoft' incorrectly highlights the occurrences of 'Microsoft' found within the Author, Publisher and Summary fields even though they did not really contribute to the Book being included in the results or it's rank within them:

highlight_wrong

By creating the proper Highlighter for each field based on the terms used to search it the search results can be shown correctly without highlighting the un-searched fields / terms:

highlight_correct

Also, not that the fragments produced for the Summary are different - if a separate terms are used for the Title and Summary then having the Title terms highlighted in the Summary would possibly produce incorrect or sub-standard fragments.

 

Having built our Highlighters we can then iterate over the results creating a BookSearchResult to wrap each book in the result set. The same analyzer used in the initial query is then used to get a TokenStream for each field which the Highlighter instance needs to create the highlighted fragment from.

For the Title, Authors and Publisher fields we return a single Fragment which will normally be the field itself with the highlighted search terms wrapped in <b class='term'> ... </b> Html tags (courtesy of the SimpleHtmlFormatter class). The highlighted Summary is set to the best 3 fragments separated by '... <br /><br /> ... '. However big the summary is this ensures that the results contain a similar sized chunk of text with the best fragments shown (those containing the most highlighted terms).

Here is an example of the results for 'Title:Software Summary:Requirements Authors:Steve' after formatting and CSS applied to show the highlighted terms in yellow:

search_results

 

Lucene.NET can do a lot more than I've shown here. I found the best resource for learning about how to use it is the 'Lucene in Action' book:

Lucene in Action (In Action series)
by Otis Gospodnetic, Erik Hatcher

Read more about this book...

Note that this covers the Java version but applies equally well to the .NET port which is practically identical.

 

I hope this has been useful. In Part 3 I'll try and demonstrate using the Lucene.NET index to find similar items based on the frequency of shared terms. This can be used to provide 'other books you may like' or 'blog posts like this one' type functionality.

Currently rated 5.0 by 1 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:   ,
Categories:   .NET
Actions:   E-mail | del.icio.us | Permalink | Comments (1) | Comment RSSRSS comment feed

Use Aliases to develop against SQL Server on different machines

Thursday, 13 March 2008 08:11 by Simon

This is a little tip that I've found useful when working on projects on different machines.

If you have a desktop machine and separate database server then you generally wouldn't need to have SQL server running locally - either the full version OR the SQL Express edition.

So, within your app the connection string would reference the name of the server, e.g.:

<connectionStrings>
  <add name="Library" connectionString="Data Source=MyServer;Initial Catalog=Library;Integrated Security=False;User ID=library;Password=secret;" providerName="System.Data.SqlClient"/>
 </connectionStrings>

 

The problem is of course when you checkout this code on another machine such as a laptop when working on-the-road (or just down in the basement while watching an episode of 'Lost' Smile)

Sure, you can just change the config to say '(local)' or '(local)\SQL2005' or whatever ... but you run into issues with the file being changed and then having to change it back if you check it in.

Urgh.

The simplest solution I've found is to setup an Alias using the SQL Server Configuration Manager:

  • On the desktop machine, setup an alias called 'dbserver' pointing to the proper database server.
  • On the laptop machine, setup the same alias called 'dbserver' this time pointing to the local instance.

Now, the same connection string can be run on both machines (using 'Data Source=dbserver' in the connection string) without having to worry about changing it when checking it out and not checking it in if you changed it.

NOTE: I usually generate the database schema from the C# classes and NHibernate mapping file and include a data-setup tool so it isn't an issue having two separate databases and normally most of the work is on the actual application and not so much on the database schema.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Tags:  
Categories:   .NET
Actions:   E-mail | del.icio.us | Permalink | Comments (0) | Comment RSSRSS comment feed

NHibernate.Search using Lucene.NET Full Text Index (Part 1)

Monday, 10 March 2008 09:36 by Simon

Ayende added the NHibernate.Search last year but I've never seen a great deal of documentation or examples around it so hopefully this post will help others to get started with it.

Basically, this addition to NHibernate brings two of the best open source libraries together - NHibernate as the Object Relational Mapper that persists your objects to a database and Lucene.NET which provides full-text indexing and query support.

So how do you use it?

The first problem you will run into is actually finding it. Unfortunately the release of NHibernate does not include it in the \bin although it is there in the source. Download the latest version of the NHibernate source (1.2.1 GA as of writing) and compile it to produce the NHibernate.Search.dll assembly.

Before you do this though, you may want to also download the latest Lucene.NET release (2.0.004) and replace the Lucene.NET.dll assembly in the NHibernate \lib\net\2.0 folder (I'm assuming you are using .NET 2.0). While the Lucene.NET library has the same version number and did work fine, the sizes are different and I ran into some problems when trying to use some of the extra Lucene.NET assemblies for hit-highlighting and similarity matching.

The first step is of course to add a reference to NHibernate.Search.dll to your Visual Studio.NET Project.

Next, you need to add some additional properties to the session-factory element of the NHibernate configuration section(normally stored in your web.config file):

<property name="hibernate.search.default.directory_provider">NHibernate.Search.Storage.FSDirectoryProvider, NHibernate.Search</property><property name="hibernate.search.default.indexBase">~/Index</property>

 

If you've used Lucene.NET much you will know that it has the concept of different directory providers for storing the indexed such as RAM or FS (File System). The entries above are used to indicate that we want the Lucene index to be stored on the file system and located in the /Index folder of the website (it could of course be outside the website mapped folder). It's well worth reading a book such as Lucene in Action to get a good idea of how Lucene works and what it can do (it's for the Java version but is still excellent for learning the .NET implementation).

The next step requires that you decorate your C# class with some attributes to control the indexing operation. Personally, I don't like this as it means I need to start referencing NHibernate and Lucene assemblies from my otherwise nice, clean POCO (Plain Old CLR/C# Classes) project. It would have been much nicer IMO if this information could have been put in the NHibernate .hbm.xml mapping files but it's a small price to pay and some people already use the attribute approach for NHibernate anyway.

Here is an example of a Book class for a library application with the additional attributes:

[Indexed(Index = "Book")] public class Book : IBook {     private Guid _id;     private string _title;     private string _summary;     private string _summaryHtml;     private string _authors;     private string _url;     private string _smallImageUrl;     private string _mediumImageUrl;     private string _largeImageUrl;     private string _isbn;     private string _published;     private string _publisher;     private string _binding;     [DocumentId]     [FieldBridge(typeof(GuidBridge))]     public Guid Id     {         get { return _id; }         set { _id = value; }     }     [Field(Index.Tokenized, Store = Store.No)]     [Analyzer(typeof(StandardAnalyzer))]     [Boost(2)]     public string Title     {         get { return _title; }         set { _title = value; }     }     [Field(Index.Tokenized, Store = Store.No)]     [Analyzer(typeof(StandardAnalyzer))]     public string Summary     {         get { return _summary; }         set { _summary = value; }     }     public string SummaryHtml     {         get         {             if (_summaryHtml == null || _summaryHtml.Length == 0)             {                 return _summary;             }             return _summaryHtml;         }         set { _summaryHtml = value; }     }     [Field(Index.Tokenized, Store = Store.No)]     [Analyzer(typeof(StandardAnalyzer))]     public string Authors     {         get { return _authors; }         set { _authors = value; }     }     public string Url     {         get { return _url; }         set { _url = value; }     }     public string SmallImageUrl     {         get { return _smallImageUrl; }         set { _smallImageUrl = value; }     }     public string MediumImageUrl     {         get { return _mediumImageUrl; }         set { _mediumImageUrl = value; }     }     public string LargeImageUrl     {         get { return _largeImageUrl; }         set { _largeImageUrl = value; }     }     [Field(Index.UnTokenized, Store = Store.Yes)]     public string Isbn     {         get { return _isbn; }         set { _isbn = value; }     }     [Field(Index.UnTokenized, Store = Store.No)]     public string Published     {         get { return _published; }         set { _published = value; }     }     [Field(Index.Tokenized, Store = Store.No)]     [Analyzer(typeof(StandardAnalyzer))]     public string Publisher     {         get { return _publisher; }         set { _publisher = value; }     }     public string Binding     {         get { return _binding; }         set { _binding = value; }     } } 

Now we're ready to start using it from NHibernate. To do this we need to create a FullTextSession and use this instead of the regular NHibernate Session (which it wraps / extends):

ISession session = sessionFactory.OpenSession(new SearchInterceptor());IFullTextSession fullTextSession = Search.CreateFullTextSession(session);

 

And that's it. You can use the IFullTextSession in place of the regular ISession (even casting it for places where you are just doing normal NHibernate operations). All the magic happens inside NHibernate.Search - when you add, update or delete records the 'documents' in the Lucene index are automatically updated which provides you with an excellent Full Text index without a Windows Service in sight!

You can check that it's working by looking in the Index folder - there should be a 'Book' folder containing the Lucene index files (with CFS extensions).

In the next post I'll demonstrate using the index to do some queries including hit-highlighting for presenting the results but for now you may want to download and try Luke - a Java program to browser Lucene index catalogs (the file format is identical between the two implementations).

Currently rated 5.0 by 4 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Tags:   ,
Categories:   .NET
Actions:   E-mail | del.icio.us | Permalink | Comments (2) | Comment RSSRSS comment feed

Re-associate .aspx and .ascx files with the codebehind file

Saturday, 26 January 2008 12:26 by Simon

I recently inherited an application to work on which for whatever reason didn't have the code-behind files linked with the corresponding .aspx and .ascx files.

Visual studio looks for a codefile="class.cs" attribute in the page or control directive and this was missing.

So, when opening the application in Visual Studio the list of files was twice as long as it should have been and prevented Visual Studio from working some of it's magic like it does.

There were hundreds of files and editing each one to add the codefile="..." attribute to each page or control tag would have taken far too long.

What is the quickest way of turning this:

<%@ page language="c#" inherits="Company.Application.Class, Assembly" theme="Default" %>

... into this:

<%@ page language="c#" inherits="Company.Application.Class, Assembly" theme="Default" CodeBehind="Class.aspx.cs" %>

? Regular expressions of course !

Visual Studio can do search and replace and provides it own Regular Expression syntax that can be used (slightly different to most other versions strangely including the .NET framework itself but it works).

In case anyone needs to do this the syntax I used was:

Find what:
^{\<\%\@:bpage.+inherits=\"}{.+}\.{.+}\,{.+\"}{.+}$

Replace with:
\1\2.\3,\4 Codebehind="\3.aspx.cs"\5

Not forgetting to make sure that the 'Use:' option has 'Regular Expressions' selected.

Now, one click of the 'Replace All' button and the files were re-associated (after saving and re-opening the project).

For controls, I just replaced the 'page' part of the Find text with 'control'.

If you don't use a namespace then you may need to tweak the regex's a little but hopefully this should save you some work.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Box Shot 3D - or 'do one thing and do it well !'

Sunday, 13 January 2008 10:33 by Simon

Photoshop is an amazing program. You can do anything with it ... if 'you' are a photoshop expert of course. I am not.

So, when I found myself wanting to spruce up a website and include some nice product box displays and screeshots that were more than just flat screen captures I trawled through the countless tutorials and 'how to' instructions on how to work magic with Photoshop to create such a masterpiece (because I am a software developer and not a graphic designer or marketing maestro). This is what I wanted:

But why is it such hard work? Isn't Photoshop supposed to make it easier to do things like this? Well yes ... if you are an expert with it (is this where I tell the programmers 'recursion joke?'*). The trouble with Photoshop is that it can do anything and everything and so is complex with a myriad of options and ways you need to discover to make it do what you want.

I can follow instructions and did get something almost looking how I wanted but it was far too difficult, took far too long and was far too hit and miss.

Of course, my brain waited until then to remind me that I had already bought a little shareware app called www.boxshot3d.com a few years earlier (for a handful of dollars) which did this one thing. First stop was the website where there was a new version waiting to download - nice to see it was still being developed and there was no upgrade fee to pay so I re-installed it and started to tinker with it. The interface had been updated and there were a few nice new features but it still focused on doing it's single task of rendering boxes and screenshots in 3D (now with the addition of rendering books too).

What a difference! Literally within minutes I was generating better results than I had following several Photoshop tutorials and better yet it was easy to reproduce things consistently.

Why is it better? Because it only tries to do a single thing but what it does it does incredibly well.

The two box-shots above were created with the app (shame prevents me from showing my attempts with Photoshop) and I also did a few screenshots for a website revamp similar to the one below.

Notice the nice Web 2.0 / Apple style 'reflection in the glossy desktop look'. It can also do a lot of things with light sources and shadows but I went for a simple look. If you need something similar it's worth checking out the http://www.boxshot3d.com/ website and getting a copy - it is well worth the money.

Also, a good reminder to keep apps simple and focused and able to "scratch an itch".

* In case you haven't heard it: "To understand recursion you first have to understand recursion."

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Tags:   , ,
Categories:  
Actions:   E-mail | del.icio.us | Permalink | Comments (0) | Comment RSSRSS comment feed

Stupid web trick - displaying an image without an image (Image2Html)

Saturday, 5 January 2008 13:00 by Simon

This is an old project I came across while seaching for a different project called Html2Image (which, given a URL would produce an image of the rendered page). I can only describe this as a 'stupid web trick' although there may be some uses for it. This one will take an image file and convert it to HTML.

Not to an HTML <img src="theimage.gif"/> tag though, to actual HTML - each pixel is an HTML element with the background color coming from a 'palette' of CSS styles. This is done to try and reduce the size of the generated XHTML and some RLE (Run Length Encoding) is also used to shrink repeated pixels of the same color to a single element entry in the output. For large images the output size will be prohibitive but for small images and icons it becomes more usable.

The net result is that you see the same image on the screen in the browser (or in an email client?) even if images are turned off. I did some experiments (again, several years ago) and it was possible to have richer-looking emails without needing images but again, things may have moved on and it may not be viable anymore.

The original project (at least 3 or 4 years years old) used <p> elements for each pixel but browsers must have moved on (and XHTML rendering is different to ye-olde-HTML) so I had to change the element tags to make it work again. It runs ok on IE and Safari but Mozilla / Firefox doesn't render it as it is (I honestly really don't know why people rave about it). I'm sure a bit of playing around with different element types and CSS attributes (line-height, font-size etc...) will produce something that runs. It may require different element types on different browsers (to cater for browsers that refuse to render empty elements and such like) but the approach will be the same.

Here are some screenshots of an example page which shows the same image displayed as a regular XHTML <img> tag and also as XHTML ...

Internet Explorer:

Safari:

Source Code: 

The VS.NET 2008 project is downloadable from the bottom of this post but here is the actual class that does the work and is usable in any version of the .NET runtime:

using System;
using System.Collections.Generic;
using System.Drawing;
using System.Drawing.Imaging;
using System.IO;
using System.Net;
using System.Text;

namespace InteSoft.Web.Utility
{
    public static class Image2Html
    {
        private const string containerElement = "div";
        private const string containerClass = "image";
        private const string pixelElement = "img";
        private const string pixelClassPrefix = "p";

        public static string Convert(string url)
        {
            if (null == url)
                throw new ArgumentNullException("url");

            if (url.Length == 0)
                throw new ArgumentException("url required", "url");

            Uri uri = new Uri(url);
            return Convert(uri);
        }

        public static string Convert(Uri url)
        {
            if (null == url)
                throw new ArgumentNullException("url");

            if (!url.IsAbsoluteUri)
                throw new ArgumentException("absolute url required", "url");
           
            WebClient wc = new WebClient();
            byte[] imgBytes = wc.DownloadData(url);
            MemoryStream imgStream = new MemoryStream(imgBytes);
            Bitmap bitmap = (Bitmap)Image.FromStream(imgStream);

            return Convert(bitmap);
        }

        public static string Convert(Bitmap bitmap)
        {
            if (null == bitmap)
                throw new ArgumentNullException("bitmap");

            StringBuilder sb = new StringBuilder();
            sb.AppendFormat("<style>{0}.{1}{{line-height:1px;}}{0}.{1} {2}{{margin:0;padding:0;border:0;width:1px;height:1px;}}", containerElement, containerClass, pixelElement);
            ColorPalette colorPalette = bitmap.Palette;
            IDictionary<Color, int> paletteClassMap = new Dictionary<Color, int>(colorPalette.Entries.Length);
            for (int idx = 0; idx < colorPalette.Entries.Length; idx++)
            {
                Color color = colorPalette.Entries[idx];
                if (!paletteClassMap.ContainsKey(color))
                {
                    paletteClassMap.Add(color, idx);
                    sb.AppendFormat(".{0}{1:X2}{{background-color:#{2:X2}{3:X2}{4:X2};}}", pixelClassPrefix, idx, color.R, color.G, color.B);
                }
            }
            sb.AppendFormat("</style><{0} class=\"{1}\">", containerElement, containerClass);
            for (int y = 0; y < bitmap.Height; y++)
            {
                Color prevColor = bitmap.GetPixel(0, y);
                int count = 0;

                for(int x = 0; x < bitmap.Width; x++)
                {
                    Color color = bitmap.GetPixel(x, y);
                    count++;

                    if (color != prevColor || x == bitmap.Width - 1)
                    {
                        if (count == 1)
                        {
                            sb.AppendFormat("<{0} class=\"{1}{2:X2}\"/>", pixelElement, pixelClassPrefix, paletteClassMap[prevColor]);
                        }
                        else
                        {
                            sb.AppendFormat("<{0} class=\"{1}{2:X2}\" style=\"width:{3}px\"/>", pixelElement, pixelClassPrefix, paletteClassMap[prevColor], count);
                        }

                        prevColor = color;
                        count = 0;
                    }
                }
                sb.Append("<br/>");
            }

            sb.AppendFormat("</{0}>", containerElement);
           
            return sb.ToString();
        }
    }
}

Enjoy!

Download source: Image2Html.zip (8.23 kb)

Currently rated 2.0 by 4 people

  • Currently 2/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Tags:  
Categories:   .NET
Actions:   E-mail | del.icio.us | Permalink | Comments (0) | Comment RSSRSS comment feed

Running .NET Apps in 32-bit mode on 64-bit Windows

Monday, 17 December 2007 12:37 by Simon

The normal behavior for .NET 2.0 applications compiled with the default 'Any CPU' platform is to run as 32-bit on x86 (32-bit) Windows and as 64-bit on x64 (64-bit) Windows.

Occasionally, some apps won't run correctly - I've recently run into this with CCNetConfig (a CruiseControl.NET Configuration tool) and have seen it before with other tools. Another obscure scenario where it shows up is if you try to use the JET OleDB driver which will fail in 64-bit mode because there isn't one! (it has to be 32-bit).

Rather than have to recompile the app or even worse, run a 32-bit Virtual Machine, there is an easy way to force .NET to run an app in 32-bit mode using the 'CorFlags.exe' tool.

Depending on your system this may be installed in different places. I've seen it in different places on XP64 and Vista X64:

  • C:\Program Files\Microsoft SDKs\Windows\v6.0\Bin\x64\CorFlags.exe
  • C:\Program Files (x86)\Microsoft Visual Studio 8\SDK\v2.0\Bin\CorFlags.exe

Running this from the command line with the path / filename of the app you want to change and the switch /32BIT+ to turn on 32-bit mode, e.g.:

   CoreFlags.exe TheApp.exe /32BIT+ 

If that fixes the problem then you know that it is a 64-bit issue. You can re-enable 64-bit operation for the app by turning off the 32-bit switch with the parameter /32BIT-, e.g.:

   CoreFlags.exe TheApp.exe /32BIT- 

Voila ... control over 32-bit and 64-bit execution without doing a recompile! I'm not 100% certain but I think that this switch sets the same flag that the 'x86' and 'Any CPU' targets set in Visual Studio.

Currently rated 5.0 by 5 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Tags:   , ,
Categories:   .NET
Actions:   E-mail | del.icio.us | Permalink | Comments (0) | Comment RSSRSS comment feed

Bootstrap your build - good practice for CruiseControl.NET

Sunday, 16 December 2007 03:43 by Simon

If you are reading this (a developers blog) then chances are you are already in the 20% of developers who know there is more to development than 'developing' (code). So, you are probably using version control such as Subversion, SourceGear Vault or heck, even SourceSafe (for all people knock it, it is infinitely better than nothing at all!) and maybe have a continuous integration system such as CruiseControl.NET or Draco.NET setup to automatically do a build when the source-code changes.

What many people leave out though is making sure that the build scripts themselves and the whole source control configuration is itself protected by version control so that in an emergency a new build machine can be setup quickly and easily without having to develop everything from scratch or remember all the triggers, rules and targets that were in the original system.

Here is a technique that I've found useful and also helps to keep the build configuration for each project with the project that it belongs to (i.e. in it's repository) rather than off somewhere else. It also enables you to make all changes to your build system via the normal Subversion working-copy / commit process.

For this example, lets assume that we have 3 subversion repositories:

The build repository would contain the default CruiseControl.NET files including the ccnet.config file in the server folder which is where the configuration for each project normally resides. We're going to move pieces of that file into each separate project repository.

The CruiseControl.NET wiki describes how to Configure CruiseControl.Net to Automatically Update its Config File which is the bootstrap part of the process - this allows us to make changes to the build config file from another machine without having to get onto the build server and also ensures that all the configuration is stored in a repository.

My ccnt.config file is slightly different to the one shown in the wiki to enable the individual project configurations to be stored in their own repositories so each project has it's own folder:

<!DOCTYPE cruisecontrol [
  &lt;!ENTITY accelerator SYSTEM "file:accelerator\ccnet.config">
  <!ENTITY redirector SYSTEM "file:redirector\ccnet.config">
]>
<cruisecontrol>
  <project name="ccnet">
    <sourcecontrol type="svn">
      <trunkUrl>http://buildserver/svn/build/trunk/CruiseControl.NET/server/config</trunkUrl>
      <workingDirectory>C:\Program Files (x86)\CruiseControl.NET\server\config</workingDirectory>
      <executable>C:\Program Files (x86)\VisualSVN Server\bin\svn.exe</executable>
    </sourcecontrol>
    <triggers>
      <intervalTrigger name="ci" seconds="60" buildCondition="IfModificationExists" />
    </triggers>
  </project>
  &accelerator;
  &redirector;
</cruisecontrol>

 

Remember that the ccservice.exe.config and / or ccnet.exe.config file will need to be changed to tell CruiseControl.NET where to look for the ccnet.config file now that it isn't in the original place:

 <appSettings>
  <!-- Without this appSetting ccservice will look for ccnet.config in its own directory. -->
  <add key="ccnet.config" value=".\config\ccnet.config"/>
  <add key="service.name" value="CCService"/>
  <add key="remoting" value="on"/>
  <add key="ServerLogFilePath" value="ccnet.log"/>
  <!-- Used by the WebDashboard ServerLog plugin to locate the log file produced by the LogFileTraceListener (above) -->
  <add key="ServerLogFileLines" value="100"/>
  <!-- Used by the WebDashboard ServerLog plugin to determine how many lines from the log file should be read -->
  <add key="WatchConfigFile" value="true"/>
  <!-- Turns on or off the file watcher used to monitor the ccnet.config file -->
 </appSettings>

 

The next step is to supply the ccnet.config files for each project which will come from the repositories for those projects.

I've created a 'build' folder in each ...

... and inside that is the ccnet.config project snippet for that project:

  <project name="redirector">
    <workingDirectory>E:\ccnet\redirector</workingDirectory>
    <sourcecontrol type="svn">
      <trunkUrl>http://buildserver/svn/redirector/trunk</trunkUrl>
      <executable>C:\Program Files (x86)\VisualSVN Server\bin\svn.exe</executable>
    </sourcecontrol>
  </project>

 

(the real-life project contains a lot more - labellers, targets etc...)

The final piece to tie everything together is to make these project-specific files part of the config folder that the build ccnet.config file is in above and for this we use the svn:externals facility.

I used TortoiseSVN to add the property to a checked-out copy of the config folder:

NOTE: The screeshot above only shows the first entry. An additional entry would be made for the accelerator project.

After committing all these files the build server will automatically update it's own ccnet.config and get the separate snippet of the file for each project from the separate project repositories because of the svn:externals. The folder structure will then appear as below:

Everything is stored in source control which makes moving to a new build machine easier and the configuration for each project is stored in the same repository as the project itself.

Adding or removing projects from the continuous integration / build system is simply a case of editing the root ccnet.config file and setting the appropriate svn:externals property on the config folder which can be done on a working copy of the build repository. An additional advantage is that all changes can now be done via subversion itself without needing direct access to the build machine so normal repository security can be used.

Incidentally, in case you are wondering what happens if a non-working configuration is committed don't worry - CruiseControl.NET is smart enough to avoid using something that won't load and keeps the previous version in memory so all you need to do is correct it and commit a new working version and it will then pick that up and carry on.

Currently rated 4.7 by 3 people

  • Currently 4.666667/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Ode to the MVC Framework

Sunday, 9 December 2007 13:36 by Simon

We've all been there ... thought we were going to get some code out of the door but then for one reason or another we couldn't manage it. I hope ScottGu et al who worked so hard to try and get it out this weekend don't take this the wrong way but here is a modified version of the lyrics to a song from the musical 'Les Miserables' for all of us who are sooo keen to get our hands on the new framework and waited all weekend for it ...

MVC framework (to 'Les Miserables - Empty Chairs At Empty Tables' music)

[A developer who got his hands on the MVC framework and blogged about it]

There's a grief that can't be spoken
There's a pain goes on and on
Empty pages, empty tables
Now my sites are dead and gone

Here they talked of view-controllers
Here it was they lit the flame
Here they sang about the framework
And the framework never came.

From a blog page in Seattle
They could see a world reborn
And they rose with voices ringing
I can hear them now!
The very code that they had posted
Became their last communion
On the lowly CTP...
At dawn.

Oh my friends, my friends forgive me.

[The ghosts of those who died waiting for the MVC framework appear.]

That I have code and you have none
There's a grief that can't be spoken
There's a pain goes on and on

Phantom frameworks run on windows
Mock assemblies on the floor
Databases with no tables
Where my view will run no more.

[The ghosts fade away.]

Oh my friends, my friends, don't ask me
What your sacrifice was for
Empty projects with no tables
Where my code will run no more...

Ah well, at least I have something to look forward to next week!!

Currently rated 5.0 by 1 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Tags:   ,
Categories:   .NET | MVC
Actions:   E-mail | del.icio.us | Permalink | Comments (1) | Comment RSSRSS comment feed