[GUFSC] Interview with Sleepycat President and CEO, Michael Olson

Sexta Abril 4 16:24:59 GMT+3 2003

[ Apesar da entrevista ser um pouco antiga, achei interessante.  --rro ]

[http://www.winterspeak.com/columns/102901.html]

Interview with Sleepycat President and CEO, Michael Olson

Monday, Oct 29, 2001
http://www.winterspeak.com/columns/102901.html
Zimran Ahmed

How to make money with the GPL. How to promote and spread free
software. How open source's experience advantage with developers gives
companies a competitive edge. Sleepycat President and CEO Michael
Olson shows us what happens when free software meets intelligent
business strategy.

Could you tell us a little about Sleepycat?

Sleepycat Software was founded in 1996 to develop, maintain and
support the open source Berkeley DB product. Our approach to business
has been very different from that of many other software companies
that started during the past several years. We've always been funded
by our revenues, and have never taken any capital from outside
investors. We've been profitable since inception.

We have a dual licensing strategy that permits us to distribute an
open source product but still make a living off of software licensing
for that same product. Open source licensing has given us an enormous
installed base and a large pool of developers who know and like our
product. Our for-pay licensing strategy has allowed us to hire
developers, salespeople and marketing staff, and to promote and
support Berkeley DB.

We've doubled revenues annually since we started Sleepycat. Despite
the tech downturn in 2001, we expect to record substantial growth in
revenues this year as well. The company was started by two people,
Margo Seltzer and Keith Bostic. Today we have thirteen employees,
mostly in Boston and the SF Bay Area, with a few elsewhere.

Nearly all of our customers are original equipment manufacturers, or
OEMs. They embed Berkeley DB in the products that they build, and then
ship those through to end users. We have a very small direct sales
force to reach our OEM customers.

We have a couple of hundred paying customers, and an unbelievable
number of non-paying users under the open source license. I did some
work last year to quantify our installed base. Counting all the
projects and products that bundle and redistribute Berkeley DB, we
estimate that there are more than 200 million copies deployed
worldwide. We get about 1200 copies downloaded daily from our Web
site. That doesn't count copies from mirrors, copies bundled with
other open source distributions, or copies shipped by proprietary
vendors.

If you surf the Web, send email, or shop on-line, the chances are that
you use our software. Berkeley DB is embedded in network
infrastructure products like routers and switches, DNS and Web content
caches, email servers and clients, and is used by an enormous number
of ISPs and ASPs for Web content delivery or back office services.
Companies like Cisco, Sun, HP, IONA, Amazon and Sendmail use Berkeley
DB. Open source projects like Cyrus, Squid, RPM, Postfix, and MySQL
include it.

We're proud of the success we've had. It's due to the quality of the
people that we've managed to attract and retain. We're small, but
everyone here is very smart. We have a very senior team of technical
people working on an established, mature software product.

We don't look like a typical technology company. In a male-dominated
field, we're about evenly split on gender lines. In a
work-all-the-time industry, we emphasize the importance of families
and interests outside the company. We've always earned more money than
we've spent, so we have never had to do layoffs. Lots of companies
talk about empowering employees -- at Sleepycat, everybody makes
important decisions every day. We have a remarkably liberal benefits
policy for a company our size.

Sleepycat is the best place I've ever worked.

How did the Berkeley DB code base come into existence originally?

In 1991, Keith Bostic, Margo Seltzer and I were all at UC Berkeley.
Keith was working for the Computer Systems Research Group, which
produced the Berkeley Software Distributions (BSD, popularly known as
Berkeley UNIX). At that time, the CSRG was trying hard to produce a
version of the distribution that included no AT&T copyright code, so
that people could get the source code for Berkeley UNIX without having
to buy a source license from AT&T.

Margo and I were doing graduate research in database systems. Keith
approached us and asked us to produce a version of the dbm library,
which is still a widely-used single-user data store on UNIX systems,
that was unencumbered by an AT&T copyright. We thought it was an
interesting project, and agreed to work on it.

The result was eventually shipped with the 4.4 BSD release as
"Berkeley DB". That version had a dbm-compatible interface and
supported two storage structures, hash tables and btrees. It was
distributed under the same BSD licensing terms as the rest of the BSD
software. Not long after it shipped, I went off to do other things.
Keith and Margo continued to maintain it, and eventually got to
release number 1.85 under that license.

The 1.x code was picked up by a lot of different open source and
proprietary developers. Notable among those, for later business
reasons, were Sendmail, the SLAPD group at University of Michigan, and
the Cyrus project.

In 1996, Netscape Communications decided to build a suite of server
tools. That suite was to include an LDAP server, and Netscape
recruited a number of the core team from University of Michigan that
had done the LDAP work there. With guidance from those developers,
Netscape approached Keith and Margo to ask them to add some new
features, including support for multiple users and for transactions
and disaster recovery, to the 1.85 version of Berkeley DB.

Margo and Keith agreed, and on the strength of that deal founded
Sleepycat. The agreement with Netscape left ownership of the new
intellectual property with Sleepycat. Margo and Keith wrote a lot of
code, hired an attorney, crafted a new license for the 2.x release of
Berkeley DB, and filed incorporation papers in Massachusetts.

Since the release of 2.0 in 1997, we've done about three releases per
year. We're currently at release 3.3. We'll ship version 4.0 late this
year. Our staff of 13 includes nine software developers. That's the
team that is doing the engineering work on new releases, the testing,
and the software support. The 3.3 release is about 150K lines of C
code, with fairly thin API layers for C++ and Java. There are Perl,
Python, Tcl, and PHP bindings as well.

How does Sleepycat's dual licensing model work?

The original version of Berkeley DB was, as I said above, released
under a BSD license. When Margo and Keith formed Sleepycat in 1996,
they wanted a license that would encourage open source projects to use
the library, but would allow them to make money from proprietary
vendors. They crafted a new license, called the "Sleepycat license,"
and used that for version 2 (and, later, versions 3 and 4) of Berkeley
DB. Version 1.85, the last of the pre-Sleepycat releases, is still
available (you can even get it off of our Web site), and it's still
under the BSD license. However, the multi-user transactional engine is
only available under the Sleepycat license.

The Sleepycat license says that you may download and use Berkeley DB
at no charge, provided that

- you do not redistribute your application code off of a single
physical site; or

- you make the complete source code for your application freely
available at no charge.

These are, effectively, the same terms as the GPL. We didn't use the
GPL for historical reasons -- carrying the BSD license and copyrights
from 1.85 would not have been possible under a straight GPL. However,
the license was designed to work exactly the way the GPL does.

Proprietary software vendors generally can't agree to these terms.
They can't afford to give away the source code to products they sell.
If a company wants to redistribute Berkeley DB as a part of a
proprietary product, they can come to Sleepycat and pay us a fee to
purchase different license terms from us. In that case, we sign a
pretty conventional license agreement permitting use and
redistribution in binary form, without forcing them to ship source. We
make the usual representations and warranties, indemnify the customer
against certain damages, and so on.

In effect, Sleepycat's dual licensing strategy says that

- if you're open source, so are we; but

- if you're a proprietary software vendor, we look exactly like all of
your other proprietary suppliers.

This works for two very important reasons.

First, Berkeley DB is a library. In order to use it, developers must
link it with their applications. That gives us leverage over the terms
under which the embedding application is distributed. We can force
them to use an open source license or to pay us money. This strategy
doesn't work for standalone applications like Web servers, relational
database servers, or mail servers, because the end user doesn't change
those or link directly with them. Note also that this wouldn't work if
we applied something like the LGPL to Berkeley DB -- it's only the
full-blown GPL-style license we have that gives us the leverage to
charge money.

Second, Sleepycat owns the intellectual property in Berkeley DB.
Unlike many other projects, there's no developer community outside the
company that's contributing code to Berkeley DB. We do the
development. In some rare cases, we do get code contributed from a
customer. When that happens, we require that ownership of that code be
transferred to Sleepycat before we'll incorporate it into our source
tree. If we allowed third party contributions that we didn't own, we
would not have the standing we need to cut proprietary licenses for
our paying customers.

So for example, if MegaISPCorp downloads Berkeley DB and uses it to
build the authentication and user database for their Web site, but it
runs only inside their data center, then they don't have to release
their source code or pay us any money. They're not shipping our code.
None of the users who visit MegaISPCorp's Web site need to release
anything, because they're not redistributing our software either.

The restrictions apply only to people who actually ship Berkeley DB.
That's the action that requires either payment or release of source
code. Building a Web service on top of Berkeley DB and making it
available via HTTP doesn't require payment or release of code.

It's not quite right to say that "under the Sleepycat license" they
can ship closed source. They can't do that at all under the Sleepycat
license -- it's effectively the GPL. If they want to ship closed
source, they need to pay us for a different license. That license
looks just like all the other agreements that vendors sign with each
other. The "dual licenses" are the Sleepycat license and a separate
license agreement for proprietary use and redistribution.

How does bug fixing work at Sleepycat? That's a big draw of many open
source models.

We don't have a large number of third party developers posting bug
fixes. Occasionally we'll get a proposed patch from the field. Most
often, we get very good bug reports: "At line X in file foo.c, you
release a mutex that you already released on line Y in file bar.c
because you're not checking condition baz." Customers use the source
to investigate problems thoroughly. We generally produce the patch,
integrate it into the source tree, and run it through our regression
and coverage suites prior to the next release.

We still have many eyes making all bugs shallow, but we don't have
many hands making the patches.

One important reason for this is that, in Berkeley DB at least and
likely in other database engines, you can't make changes to (say) the
locking subsystem unless you understand the assumptions behind
recovery processing. People who have been building database servers
for a long time understand how all the pieces fit together, but it's
hard for a casual programmer to join a project like Berkeley DB and
make contributions quickly. There's just too much state to absorb. By
contrast, a casual contributor can get up to speed quickly on projects
like Apache or Linux, where you can work in an area that's entirely
independent of the bulk of the system.

Interestingly, the places where third party contributions *do* happen
in Berkeley DB are completely outside the core library. For example,
Robin Dunn does a fantastic job on the Python language bindings for
Berkeley DB. Likewise, Paul Marquess keeps the CPAN archive up to date
with the Perl bindings for the latest release of Berkeley DB. The API
bindings don't depend on library internals in any way, and that's a
place where we do get some leverage from developers in the open source
community. We don't own these, and we can't charge money for them, but
we don't need to. It's good for us that people writing code in those
languages can use our software.

Do customers come to Sleepycat asking for custom services often? Does
this dual license allow Sleepycat to continue development Berkeley DB
successfully?

Customers do tell us what new features to put into the product. When
we do our release planning, we look at the customer requests we've
gotten, decide which ones are interesting to our customer base
generally, and include those.

We very seldom get requests for custom development, however. We really
don't like those. They take expensive engineering talent and put it on
a project that only matters to a single customer, and we can't charge
very much for the work. I can count maybe four instances in the last
three years where we've done any amount of custom development at all,
and all of those were very small projects. Even in those cases, we
owned the changes and they got rolled into our main code line, even
though the majority of our customers won't take advantage of them.

We vastly prefer to make a living off of software licensing, not
services. In fact, three quarters of the money we make comes from
licensing, and only a quarter from support and related services. Given
that, it's much more in Sleepycat's interest to have our high-powered
developers working on features that we can go sell to lots of
customers, rather than projects that we can sell to just one or two.

Could Sleepycat exist if Berkeley DB was under the GPL? Do you think
the work Sleepycat has done would be (commercially) possible if the
original code was GPL'd?

Sleepycat could absolutely exist if Berkeley DB were under the GPL.
Our business model depends on our ownership of the intellectual
property in Berkeley DB, and on our ability to use dual licensing for
companies that don't want to comply with the open source terms of the
Sleepycat license. The GPL would permit this in the same way that the
Sleepycat license does.

Both Sleepycat and the Free Software Foundation have looked hard at
the two licenses, and we agree that the Sleepycat license is
compatible with the GPL. This means that GPL'ed projects can use
Berkeley DB under the Sleepycat license, because the GPL meets the
"open source" requirement of the Sleepycat license and the Sleepycat
license imposes no additional restrictions beyond those in the GPL.

A big reason for Sleepycat's success has been the widespread adoption
of the 1.85 Berkeley DB code under the BSD license, dating back to
1991. Kirk McKusick has an apt characterization of the BSD license:
There's copyleft, which in some sense requires broad distribution of
copies, and there's copyright, which is intended to limit the
distribution of copies. Then, according to Kirk, there's copy center,
as in, "Take it down to the copy center and make all the copies you
want." BSD is a copy center license. You can make copies and use them
for whatever you want without paying anyone any money.

It's hard to say how Berkeley DB would have fared under the GPL in the
early 1990s. Certainly it was well-written and useful, and it would
have had some success. However, I can't say whether it would have been
picked up by the projects, like SLAPD, that directly led to the
formation of the company.

I will say this, though: Sleepycat couldn't exist if the current
release of Berkeley DB were under the BSD license. I'm not taking a
political stance, here -- I think that open source licenses like the
GPL and the BSD license are valuable, and that both have created
enormous value. As a business matter, though, the BSD license wouldn't
allow Sleepycat to pursue the dual licensing strategy that we have
with Berkeley DB.

The business lesson here is that you need to consider your product
strategy, your business model, and your licensing terms as a coherent
whole. Our answers are embedded storage management, revenues from
product licensing, and dual GPL/proprietary terms. If you change any
one of those three, the business doesn't work anymore.

Why aren't dual licenses more common among free software businesses?

Most free software projects are standalone utilities. Unless you can
impose restrictions on the end user's application code, you don't have
the leverage you need for dual licensing. This is simplest for
libraries that are released under GPL-style terms, like ours. There
are a few cases besides us that I know about. For example, MySQL AB in
Sweden has GPL'ed their client-side library, but they'll sell
customers proprietary licenses to build MySQL clients using exactly
the same dual licensing strategy that we have.

And, as noted earlier, ownership of the IP is crucial. If you've got
ownership shared among developers all over the globe, there's no
single entity that customers can approach for a closed-source
redistribution license.

Is relaxing the GPL's redistribution requirements was valuable to some
customers?

There are GPL'ed packages -- like Linux and the GCC toolchain -- that
have enormous installed bases, but there aren't so many *libraries*
that are widely redistributed under the GPL. The FSF created the LGPL
to address exactly this problem: proprietary vendors can't use
libraries under the GPL in their closed source products, but the LGPL
allows that.

We weren't really being calculating when we released Berkeley DB 1.0
under the BSD license. All Berkeley software was under the BSD
license. We just did what the rest of the people in our building were
doing. If we'd chosen the LGPL instead, it likely wouldn't have made
any difference to how broadly our software got picked up and used by
other projects and by proprietary vendors.

I can't say what the difference would have been if Berkeley DB 1.0 had
been GPL'ed instead. I can't point to any single early user who would
have declined to use Berkeley DB under the GPL.

That said, starting with a BSD license and switching to the Sleepycat
license certainly worked for us. It's ironic, really. You often hear
that the BSD license is business-friendly, and that the GPL is the
great destroyer of intellectual property. Well, in Sleepycat's case,
switching to a BSD license would kill our company. Our ability to
charge money for our intellectual property depends entirely on a
license that's just like the GPL.

Do people ever break licensing terms? How do you manage that?

It happens. Generally it's an accident -- no real company wants to be
in violation of another company's intellectual property rights, so it
very seldom happens intentionally. When we find out about a case like
this, we contact the person or company involved, explain the terms of
the Sleepycat license, and point out the violation. Almost every
single time, the other party has gotten under paid license quickly. In
one or two cases, when they understood the problem, they stopped using
our software.

The most common way that we find out about these cases is that someone
contacts us for technical support on the product, but we have no
record of them in our sales database.

How do businesses feel about using open source software? Does it give
you a competitive advantage or disadvantage?

We compete with proprietary database vendors on a lot of fronts -- I'd
argue that we generally win on performance, reliability, and
scalability. Other factors, including open source, play a role in
helping us win deals, but the major impact of open source for us is
that it gets us into the deal in the first place.

Certainly companies care about control and visibility into the
development process. Because they get the complete source code for
Berkeley DB, our customers know they don't need to talk to us about
new ports or custom features. Whether they ever do ports or custom
features, both matter for planning reasons. During development, the
fact that they've got our source code means that writing to the APIs,
figuring out how they work, and debugging problems is much simpler.
That speeds up development, and that's valuable to customers.

Most importantly, though, developers can come to our Web site and
download the complete source for our product quickly and easily.
There's no charge for developer licenses and no feature-crippled
evaluation version. They get the actual product they'll ship, and they
can try it out and integrate it into their products. This is much
easier and faster for our customers, and it's good for us, as well: By
the time they've decided we have a good solution, we're pretty well
entrenched in their product. That makes it harder for our competitors
to dislodge us.

This last issue -- ease of access for developers -- is a big
competitive advantage for Berkeley DB over proprietary products, which
have various problems with open-ended no-cost full-version evals. It
helps us win business.

One last comment on this point: The market is generally much smarter
about open source licensing than it used to be. Most of our customers
at least know the term, and have heard of the GPL. That's both good
and bad -- many have heard some of the more polarizing claims about
open source, and need to be educated about our license and the
business value it conveys on them. There's more fear, uncertainty and
doubt among customers than there was a year or two ago, when the words
"open source" never entered our conversations with many proprietary
vendors.

Why the embedded market? Plans to go elsewhere?

Sleepycat's core strength has always been high-end Internet
infrastructure applications -- we dominate the messaging and directory
server markets, and we're deployed at the big ISPs and portal sites.
We continue to increase our sales across the board in this horizontal
market. We think we've got an outstanding product for these
applications.

In the last year or so, we've begun to do substantial new business
among vendors building "embedded systems." This term gets abused, but
generally, it means some special-purpose device, generally without a
desktop-style UI, providing a single service. Examples range from the
fuel mixture sensor in your car's engine, to a palmtop computer, to a
set-top box, to an eight-way multiprocessor providing storage
virtualization services. It's a *very* broad market.

The companies using Berkeley DB in this market are generally building
appliances that need to scale to moderate numbers of users (say, in
the thousands), and that need very fast predictable response times.
Examples include network file servers, wireless network gateways, and
optical switches. The particulars of each of these are very different
from the others, but all of them need fast, reliable data management.
Most importantly, you're not allowed to ship a relational database
administrator with every box you sell.

Berkeley DB's an ideal storage engine for products like that. There
are two reasons that we're so excited about this emerging market.

First, it's growing explosively. Storage virtualization is $8B today,
headed for $37.5B in three years. Telco and datacomm, despite the poor
performance of the public players today, has a CAGR of 23% through
2005. New companies are forming, getting funding, and buying tools
like Berkeley DB for the products they're building.

Second, there is no established leader selling databases in this
market today. There's simply no Oracle here yet, dominating the market
and booking most of the business. We believe that because of the
unique technical characteristics of our product, our strong track
record in the business, and our clear focus on the opportunity, we can
be that leader. There's a lot of money to be made.