Archive for the ‘Infrastructure’ Category

Learning from THE WEB: Making things simple enought that one can harness Moore’s law in parallel

Friday, April 21st, 2006

Adam Bosworth reviews the unintuitive lessons of THE WEB in
ACM Queue – Learning from THE WEB

His second point is

2. It is worth making things simple enough that one can harness Moore’s law in parallel. This means that a system can scale more or less linearly both to handle the volume of the requests coming in and, ideally, to handle the complexity. For example, Google is able to handle huge numbers of requests that ask for quite complex work (find all the documents out of billions that are “most relevant” to the following words) largely because it can scale linearly with respect to both the size of the underlying corpus and the volume of queries. DNS is, of course, the best example of this, but so is caching, which brings me to point 3.

3. Have databases enabled people to harness Moore’s law in parallel? This would mean that databases could scale more or less linearly to handle both the volume of the requests coming in and even the complexity. The answer is no. Things like ORDER BY, joins, subqueries, and many others make it almost impossible to push the query logic down to an arbitrary number of leaf nodes and simply sort/merge/aggregate the results. The easier way to limit queries to avoid this would be to limit all predicates to ones that can be computed against a single row at a time, at least where efficiency and scale are paramount.

My thesis adviser Phil Windley poses the following question: Does XQuery suffer from these same limitation?

If you look at XQuery as just a SQL replacement with support for XML data types, they yes it definitely suffers from the same imitation, namely query complexity that cannot be parallelized.
XQuery is not just a SQL replacement. XQuery was designed with several key points.

XQuery can be seen as a superset of XSLT functionality. As results come out of a google search style data store, XQuery can be used to format or transform results much like XSLT. XQuery makes sorting transformed results trivial, while sorting in XSLT is unnecessarily difficult to implement.

XQuery is computationally complete (It’s a Turing complete functional language.) and supports first order, higher order, and user defined functions. Google style queries are executed using a wide range of underlying utility and specialized functions. Theoretically, XQuery could be used as the glue or processing code that calls the low level search functions. XQuery or a derivative dialect could be the standard for describing a Google style search process or query. A new style language is needed to describe the process of executing embarrassingly parallel queries or Google style queries.

Off hand, BPEL is an example of such a new style orchestration language that drives parallel processes.
Because of it’s focus on businesses operations and interdependence’s, BPEL is not a solution of Google style attention or relevancy queries. But in many ways, BPEL is step forward, an attempt to declare the problem and let a specialized engines handle the work.
In some ways I think this is what Adam Bosworth is looking for, a way to describe and execute a huge quantity of unrelated relevance queries against all available knowledge.

I believe that making the connect between computation and data more seamless is a step in the right direction. The right answer will be highly declarative. And in that respect it should be somewhat like SQL, minus non-parallel or barrier operations such as joins, sub selects, huge ordering operations, etc. SQL lacks computational power, user defined functions, etc to even be a starting point down this road. Adam himself has said that he doesn’t believe that the relational model applies, at least at the front end.

This is where XQuery brings power. XQuery is functional in nature and can used in a purely functional manner. What does this mean? First no side effects. Every operation is repeatable. Second functions can be easily composed without worrying unexplainable behavior. XQuery is also declarative. XQuery allows the programmer to describe complex problems as higher order processes without getting bogged down in the ity gritty details of implementation.

It is all about abstraction. Assembly language has become largely declarative as we abstract away the fine details of microprocessor computation. Assembly language tell the computer what to do, not how to do it. Programmers do generally specify which processor their code is run on, when or how data navigates through the memory/caching pipeline, which execution unit does the work or in which order execution will occur. Don’t get me wrong, there are experts who do spend all day every day worrying about the microprocessor execution. But the powerful abstractions of programming languages allows the majority of software to developers to focus attention up the stack.
I like Linux Torvalds latest comment has some application

“I claim that Mach people (and apparently FreeBSD) are incompetent idiots. Playing games with VM is bad. memory copies are _also_ bad, but quite frankly, memory copies often have _less_ downside than VM games, and bigger caches will only continue to drive that point home.”

Adam Boswoth want the same story for data query.
We should stop playing games with complex object-relational mappings and look towards more generalizable and scalable solutions.

XQuery is not the solution to Adam’s problem, nor is it a silver bullet answer to Phil’s questions, but I think It pushes us in the right direction. It forces us to start to reconcile and solve the perceived differences between object orientation, relational data and XML.

XQuery shows that data query and computation can be tightly coupled to enhance programmer productivity without giving up power or optimization in either realm.

Bonus Links:
The Gillmor Gang with guest Adam Bosworth

Utah Asterisk User Group Meeting January 11

Saturday, January 14th, 2006

I attended the utaug meeting Wednesday Evening
Great group of people showed up. In fact there wasn’t room for everyone in the small conference room in which we met.
Here are my notes from the meeting.

Asterisk Utah Users Group
Meeting
Presenter Jared Smith – Asterisk Evangelist/Hacker

* Business
1st Tuesday of every month.
SLC Public Library – provides free locations for holding meetings, not food allowed though, unless you put down a deposit and cleaning fee.
Dave Pacham in charge of a mailing list.

* Possible Future Meeting format
A 20 minute intro presentation
A 30 minute in depth presentation.

* Quick Ideas For Future Meetings
Echo debugging
Hardware/Gear review FXS, FXO, ATA’s etc
Clustering/HA
VOIP Service Providers
Presence – Dave Pacham
AGI Programming
User interfaces/ web interfaces
GXP speaker functions don’t work
- AstBill on Drupal
- PhoneCall on ??
Terminology

* What is wrong with Asterisk
- Config file channels – SIP channels and objects.
- Multi-tenanting
- call parking
- Dial plan
- Bluetooth channel
- Cheap ATA’s
- DLink, IAX 1402s 2 vs 6 extensions, 8 port devices are needed
Voicemail interface is good, but needs some TLC work.
Voicemail is low hanging fruit.
Voicemail goals -
Under maintenance – listening to the current greeting.

* Presentation
Broadband makes Asterisk.
Business Applications, Apache, Linux have completely changed businesses.
End users understands Features per Price.
Asterisk is flexibility.
Try to set up hunt groups on a via.

Read Mark Spencer’s Asterisk is everything presentation.
Telephony is suddenly something cool.
Linux telephony market in 10 years will be bigger than the Linux market.
Vonage does all it’s voicemail through Asterisk.
Conference bridging – Asterisk is cool. It isn’t perfect, but for the price you can’t beat it.
Everyday it gets better with each CVS and SVN checkin.
15 – 20k subscriptions to the Asterisk mailing list.
IRC channel

Jared’s Goals
1) To share knowledge. Knowledge doesn’t do you any good unless you share it.
Sharing makes it stronger.
2) It is a place where we can all come and grow and learn.
3) Help make Asterisk better.

We want Asterisk to be as ubiquitous as Apache.

What can Asterisk do for me.
Asterisk is a platform, not a product.
Asterisk is a tool not a solution.
How do we use and sharpen that tool, to the point where we know how to use it effectively.
Good GUI’s and documentation and even a O’Reilly book has appeared recently.
The initial learning curve isn’t as
Asterisk is flexible if you need a feature
1 – Scratch your own itch.
2 – Convince a core developer that is his itch also.

What can we do as members of the Asterisk community.
Ask questions in a smart way.
Choose the right forum. Developers are not a proxy for Goggle. Go do your own homework first.
Be quick to say thank you and slow to express anger (flame).
Fill out good bugs reports, add detail and be precise.
Help testing.

Jared’s story of how he became a Asterisk participant.
I got started with a single FXO card and a little USB FXS.
Three call centers
1.5 Mbit T1 line – 23 or 24 analog channels.
Only GSM codecs not G723 G729.
Send packets every 40ms instead of 20ms get up to 29-30 simultaneous calls.
3 to 4 10 hours day looking to increase simultaneous calls.
Plug all concurrent calls inside a single packet.
IAX Trunking.

Hence Jared learned how to work with user groups and developers.
The idea of opensource involvement and participation clicked.

Phil Windley’s Enterprise Computing Class

Friday, November 4th, 2005

Quote from class:

Service Oriented Architectures are much like Object Oriented Design.
Both are a way of thinking and working, a methodology.
Contrary to what vendors would have you believe, they are not something you buy.

Sidenote:
Another must read: From Scoble: Twelve Reasons not to trust/use Microsoft Products

Phil Windley’s CTO Breakfast Recap

Friday, September 30th, 2005

These are my notes from the CTO Breakfast this morning.
As usually things fly a mile a minute there, but there is a lot of information that get disseminated and a lot of learning that occurs.

Accelerating Change – Review by Scott Lemon
http://www.accelerationwatch.com/
John Smart - a series of substrate transitions.
Ray Kurszweil – says that if something doesn’t have a log curve it won’t survive, it may be an indicator of future developments, but if it doesn’t double it will be replaced.
Calculated Moors law based on electrical mechanical relays since 1900. He has abstracted all the way back to the abacus.
Law of accelerating returns, a network law that we use more powerful systems to implement more powerful systems.
Moore’s law isn’t linear in doubling it is actually
Verner Vinge – Mathematician by training, coined the term Singularity.
Asked the question, at which point, which he calls the singularity, will changes in the earth and our experience will happen faster than our senses ability to detect change.
Soft take off verses hard take off.
Soft takeoff- will we even notice when changes happen faster
Hard takeoff – we will walk into the office one morning and say wow, the world has changed.
What are the metrics to measure soft verse hard takeoffs.
Does you liver cell even know what it is a part of?
Computers cause humans to do work without any real interaction with humans.
UPS drivers are just actuators at the edge of a computer network.
Similarity to Dells server factory.
The dell server line workers and dell suppliers are driven by dells order system.
Dell’s suppliers have 90 minutes to comply with an order or dell goes with a different supplier.
Dell’s parts room consists of semi trucks from suppliers that backup to the assembly plant to be unloaded.
As one truck becomes depleted, the next truck moves into position.
Walmart consists of over 1% of China’s GNP.
We create advanced tools and use those tools to create the next generation of advanced tools.
Google has already lost control of the social impact they have on the world, and they will never regain it again.
Jeff Barr came and talked at BYU yesterday, and in the same way as Google, Amazon Web Services has/will loose control of the social impact of it’s web services.

Synthetic blood supersaturated by Oxygen.
In 2010 Intel will have a chip that will support 512 threads. Event driven programming will be the norm.

Utopia – DynamicCity presentation.
Joel Sybrowsky, Jeff Fishburn, and Ken Mormon
Singularity is like magic, and we (the geeks) all have understood how magic works.
In fiber-optics the next bits to come off the fiber will be cheaper than the bits currently coming off.
Government should put fiber in the ground and light it up and get out of the way and let innovation run.
The world would be very different if the AT&T breakup hadn’t require that AT&T allow any device to connect to its telephone network.
Ken Mormon is the architect of Utopia.
The Utopia project is being copied all over the place in all the western states and as far east as Virginia.
In a year to a year in a half Utopia will be joined by other markets such as Seattle.

How did DynamicCity choose where to start construction of Utopia?
Cities that co-signed the financial loan are the first to get deployed.
What is the governor on how fast Utopia builds out?
It is largely financial.
Competitors such as Qwest and Comcast motivated the legislature to limit the participation of cities in financially backing this deployment.

In urban areas elsewhere, trenching is a million dollars a mile, and there are seven legal negotiations per mile.
In Utah, laying fiber costs $30,000-40,000 miles per mile overhead, $130,000-160,000 per mile underground.
In the heart of Utah cities it may be as much as $300,000 per mile underground.
Manhattan, which has no dirt, pure concrete, it is going to cost at least a million a mile.

All cities and developers should be putting in conduit in all new construction area.
Underground deployment is 3 times the cost of overhead deployment but all new construction areas are deploying utilities underground.
So the cities should lay conduit in

In construction, do you buy it, lease it, or build it.
Fiber is plentiful along I-15, so Utopia leases it for long hauls.

Future topics include the importance of Q&A in projects such as Utopia.
The POTS network was/is “carrier grade”, how do we insure that Utopia is carrier grade.
Especially since the demarcation between the service providers and the “servers” they run on are so insulated and isolated from each other, especially in a network such as Utopia

Jeff Barr: Speaks at BYU

Thursday, September 29th, 2005

Jeff Barr from Amazon did a great presentation on Amazon Web Services (AWS).
He showed how developers such as www.tvmojo.com are making a living just off Amazon commissions, by selling amazon indexed products on custom web pages.

Some other cool examples Jeff demoed.

Developers who are interested should read Jeff’s AWS blog at aws.typepad.com.

Jeff continued to explain REST and SOAP and made the interesting observation, that SOAP matches well with statically typed languages such as Java and C# while REST tends to match well with dynamic languages such as Perl, Python, PHP, and Ruby. 80% of Amazon’s AWS traffic occurs over REST

Amazon also owns the Alexa web engine. If you haven’t heard of Alexa, go take a look.

Good Presentation.

Web App Grids

Monday, September 26th, 2005

I totally agree with Peter Yared’s Musings: Why Grids Make Sense. Peter basically says that webservers and appsevers often are and in many cases should be executed on the same machine, or at least the same type of machine. This allows you to pull webservers and appservers from the same cluster/resource pool. What remains is the database layer, where many bottlenecks occur. Database administrators have used techniques such as data partitioning to mitigate the database bottleneck. Adam Bosworth has has some interesting ideas about scaling databases based on

Links