I’ve now crossed the international date line (giving up a big portion of my weekend, but that’s life) and landed in Tokyo. Tomorrow I head on to Seoul and then to Beijing for the end of the week. In many ways a change of pace from the week in Vegas, but in other ways it’s more of the same (fun, that is :-).
In this previous post we looked at some code to retrieve and process RSS information from various blogs using an agent-based message passing architecture. The code wasn’t completely asynchronous or parallelised, though, as we fired off each message to start the download synchronously (although the “processing” would have launched quite quickly, yielding control back to the loop which would then launch other downloads). This post shows an even more asynchronous approach, as it fires off the downloads in parallel. We’ll see if it makes much difference. :-)
Here’s the updated F# code, with the new/modified lines in red. If you’d like a complete version of the application, with the four different implementations we’ve seen, you can get it from here.
1 // Declare a specific namespace and module name
2
3 module AgentAsyncRssReader.Commands
4
5 // Import managed namespaces
6
7 open Autodesk.AutoCAD.Runtime
8 open Autodesk.AutoCAD.ApplicationServices
9 open Autodesk.AutoCAD.DatabaseServices
10 open Autodesk.AutoCAD.Geometry
11 open System.Xml
12 open System.IO
13 open System.Net
14 open Microsoft.FSharp.Control.WebExtensions
15
16 // The RSS feeds we wish to get. The first two values are
17 // only used if our code is not able to parse the feed's XML
18
19 let feeds =
20 [ ("Through the Interface",
21 "http://blogs.autodesk.com/through-the-interface",
22 "http://through-the-interface.typepad.com/" +
23 "through_the_interface/rss.xml");
24
25 ("Don Syme's F# blog",
26 "http://blogs.msdn.com/dsyme/",
27 "http://blogs.msdn.com/dsyme/rss.xml");
28
29 ("Shaan Hurley's Between the Lines",
30 "http://autodesk.blogs.com/between_the_lines",
31 "http://autodesk.blogs.com/between_the_lines/rss.xml");
32
33 ("Scott Sheppard's It's Alive in the Lab",
34 "http://blogs.autodesk.com/labs",
35 "http://labs.blogs.com/its_alive_in_the_lab/rss.xml");
36
37 ("Volker Joseph's Beyond the Paper",
38 "http://blogs.autodesk.com/beyond_the_paper",
39 "http://dwf.blogs.com/beyond_the_paper/rss.xml") ]
40
41 // Fetch the contents of a web page, asynchronously
42
43 let httpAsync(url:string) =
44 async { let req = WebRequest.Create(url)
45 use! resp = req.AsyncGetResponse()
46 use stream = resp.GetResponseStream()
47 use reader = new StreamReader(stream)
48 return reader.ReadToEnd() }
49
50 // Load an RSS feed's contents into an XML document object
51 // and use it to extract the titles and their links
52 // Hopefully these always match (this could be coded more
53 // defensively)
54
55 let titlesAndLinks (name, url, xml) =
56 try
57 let xdoc = new XmlDocument()
58 xdoc.LoadXml(xml)
59
60 let titles =
61 [ for n in xdoc.SelectNodes("//*[name()='title']")
62 -> n.InnerText ]
63 let links =
64 [ for n in xdoc.SelectNodes("//*[name()='link']") ->
65 let inn = n.InnerText
66 if inn.Length > 0 then
67 inn
68 else
69 let href = n.Attributes.GetNamedItem("href").Value
70 let rel = n.Attributes.GetNamedItem("rel").Value
71 if List.exists
72 (fun x -> href.Contains(x))
73 ["feedburner";"feedproxy";"hubbub"] then
74 ""
75 else
76 href ]
77
78 let descs =
79 [ for n in xdoc.SelectNodes
80 ("//*[name()='description' or name()='subtitle'" +
81 " or name()='summary']")
82 -> n.InnerText ]
83
84 // A local function to filter out duplicate entries in
85 // a list, maintaining their current order.
86 // Another way would be to use:
87 // Set.of_list lst |> Set.to_list
88 // but that results in a sorted (probably reordered) list.
89
90 let rec nub lst =
91 match lst with
92 | a::[] -> [a]
93 | a::b ->
94 if a = List.head b then
95 nub b
96 else
97 a::nub b
98 | [] -> []
99
100 // Filter the links to get (hopefully) the same number
101 // and order as the titles and descriptions
102
103 let real = List.filter (fun (x:string) -> x.Length > 0)
104 let lnks = real links |> nub
105
106 // Return a link to the overall blog, if we don't have
107 // the same numbers of titles, links and descriptions
108
109 let lnum = List.length lnks
110 let tnum = List.length titles
111 let dnum = List.length descs
112
113 if tnum = 0 || lnum = 0 || lnum <> tnum ||
114 dnum <> tnum then
115 [(name,url,url)]
116 else
117 List.zip3 titles lnks descs
118 with _ -> []
119
120 // For a particular (name,url) pair,
121 // create an AutoCAD HyperLink object
122
123 let hyperlink (name,url,desc) =
124 let hl = new HyperLink()
125 hl.Name <- url
126 hl.Description <- desc
127 (name, hl)
128
129 // Use asynchronous workflows in F# to download
130 // an RSS feed and return AutoCAD HyperLinks
131 // corresponding to its posts
132
133 let hyperlinksAsync (name, url, feed) =
134 async { let! xml = httpAsync feed
135 let tl = titlesAndLinks (name, url, xml)
136 return List.map hyperlink tl }
137
138 // Now we declare our command
139
140 [<CommandMethod("amrss")>]
141 let createHyperlinksFromRssAsyncViaMailbox() =
142
143 let starttime = System.DateTime.Now
144
145 // Let's get the usual helpful AutoCAD objects
146
147 let doc =
148 Application.DocumentManager.MdiActiveDocument
149 let ed = doc.Editor
150 let db = doc.Database
151
152 // "use" has the same effect as "using" in C#
153
154 use tr =
155 db.TransactionManager.StartTransaction()
156
157 // Get appropriately-typed BlockTable and BTRs
158
159 let bt =
160 tr.GetObject
161 (db.BlockTableId,OpenMode.ForRead)
162 :?> BlockTable
163 let ms =
164 tr.GetObject
165 (bt.[BlockTableRecord.ModelSpace],
166 OpenMode.ForWrite)
167 :?> BlockTableRecord
168
169 // Add text objects linking to the provided list of
170 // HyperLinks, starting at the specified location
171
172 // Note the valid use of tr and ms, as they are in scope
173
174 let addTextObjects (pt : Point3d) lst =
175 // Use a for loop, as we care about the index to
176 // position the various text items
177
178 let len = List.length lst
179 for index = 0 to len - 1 do
180 let txt = new DBText()
181 let (name:string,hl:HyperLink) = List.nth lst index
182 txt.TextString <- name
183 let offset =
184 if index = 0 then
185 0.0
186 else
187 1.0
188
189 // This is where you can adjust:
190 // the initial outdent (x value)
191 // and the line spacing (y value)
192
193 let vec =
194 new Vector3d
195 (1.0 * offset,
196 -0.5 * (float index),
197 0.0)
198 let pt2 = pt + vec
199 txt.Position <- pt2
200 ms.AppendEntity(txt) |> ignore
201 tr.AddNewlyCreatedDBObject(txt,true)
202 txt.Hyperlinks.Add(hl) |> ignore
203
204 // Define our agent to process messages regarding
205 // hyperlinks to gather and process
206
207 let agent =
208 MailboxProcessor.Start(fun inbox ->
209 let rec loop() = async {
210
211 // An asynchronous operation to receive the message
212
213 let! (i, tup, reply :
214 AsyncReplyChannel<(string * HyperLink) list>) =
215 inbox.Receive()
216
217 // And another to collect the hyperlinks for a feed
218
219 let! res = hyperlinksAsync tup
220
221 // And then we reply with the results
222 // (the list of hyperlinks)
223
224 reply.Reply(res)
225
226 // Recurse to process more messages
227
228 return! loop()
229 }
230
231 // Start the loop
232
233 loop()
234 )
235
236 // Map our list of feeds to a set of asynchronous tasks
237 // that we then execute in parallel
238
239 feeds
240 |> List.mapi (fun i item ->
241 async {
242 let! res =
243 agent.PostAndAsyncReply(fun rep -> (i, item, rep))
244
245 // Once we have the response (asynchronously), create
246 // the corresponding AutoCAD text objects
247
248 let pt =
249 new Point3d
250 (15.0 * (float i),
251 30.0,
252 0.0)
253 addTextObjects pt res |> ignore
254 }
255 )
256 |> Async.Parallel
257 |> Async.RunSynchronously
258 |> ignore
259
260 tr.Commit()
261
262 let elapsed =
263 System.DateTime.op_Subtraction
264 (System.DateTime.Now, starttime)
265
266 ed.WriteMessage("\nElapsed time: " + elapsed.ToString())
I executed the four different versions of the commands ten times each and averaged them to get the following results:
Implementation approach (command) | Average time (secs) |
Synchronous (RSS) | 16.47 |
Asynchronous (ARSS) | 8.45 |
Message-passing synchronous (MRSS) | 15.19 |
Message-passing asynchronous (AMRSS) | 14.14 |
These values are from when running the code from our Tokyo office: I remember having quite different results when running them from home in Switzerland, for instance. As the network lag here appears larger (probably not helped by the fact I’m connecting via WiFi), the difference is more pronounced which definitely helps us when comparing and contrasting the results.
We see that complexity of this particular problem probably doesn’t merit the overhead of implementing a message-passing approach, although if we do choose to do so the network lag might make it worth the additional overhead of launching the tasks asynchronously rather than one after the other. This balance will shift depending on the local network lag, the processing time and the synchronization overhead: in general doing some kind of benchmark on a simplified version of your problem is likely to be a worthwhile investment before implementing a complete, complex system.