How many lines of open source code are hosted at the Eclipse Foundation?

Spoiler alert: 162 million!

That’s right, as of August 1st, there are 330 active open-source projects hosted at the Eclipse Foundation and if you look across the 1120 Git repositories that this represents, you will find over 162 million physical source lines of code. But beyond this number, let’s look at how it was obtained, and what it really means.

I’ve blogged several times about the importance of using metrics to monitor the health (and hopefully, growth!) of an open source project/community, and lines of code are just one. You should always have other metrics on your radar like the number of contributors, diversity, etc.

There are many ways, and many tools available out there, to count source lines of code. Openhub (previously known as ohloh) used to be a really good tool, but it doesn’t seem to be actively maintained. For a few years now, I’ve been relying on a home-made script to analyze Eclipse IoT projects, and it’s only recently that I realized I should probably run it against the entire codebase!

In this blog post, I will briefly talk about how the aforementioned script works, why you should make sure to take these metrics with a pinch of salt and finally, go through some noteworthy findings.

Line counting process

The script used to count the number of lines of code is available on Github. It takes a list of Eclipse projects’ identifiers (e.g ‘iot.paho’) and a given time range as an input and outputs a consolidated CSV file.

The main script (main.js) uses the Eclipse Project Management Infrastructure (PMI) API to retrieve the list of Git repositories for the requested projects and then proceeds to clone the repos and run the cloc command-line tool against each repo. The script also allows computing the statistics for a given time period, in which case it looks at the state of each repository at the beginning of each month for that period.

Once the main script has completed (and it can obviously take quite some time), thecsv-concat.js script can be used to consolidate all the produced metric files into one single CSV file that will contain the detailed breakout of lines of code per project and per programming language, the affiliation of the project to a particular top-level projects, the number of blanks or comment lines, etc.. It is pretty easy to then feed this CSV into Excel or Google Spreadsheets, and use it as the source for building pivot tables for specific breakouts.


Just like virtually any KPI, you want to take the number of lines of code in your project with a grain of salt. Here are a few things to keep in mind:

All lines of code are not created equal

There is an incredible diversity of projects at Eclipse, and while a majority is using Java as their main programming language, there’s also a lot of C, C++, Python, Javascript, … 10M lines of Java code probably don’t carry the same value (i.e. how much effort has been needed to produce them) as 10M lines of C code.

Trends are more important than snapshots

It is nice to know that as of today there are 162 million lines of code in the Eclipse repositories, but it is, in my opinion, more important to look at trends over time. Is a particular programming language becoming more popular? Are all the top-level projects equally active?

I didn’t have a chance to run the scripts for a longer time period yet, but I will make sure to share the results when I get a chance!

Generated code, should it count?

There is a fair amount of generated code in some projects (in the Modeling top-level project in particular, of course), which certainly accounts for a few million lines of code. However, generated code often is customized, so I think it doesn’t necessarily skew the numbers as much as one would think.

Development does not always happen in a single branch

My script just looks at the code stored in the main (HEAD) branch of the Git repository. Some projects may have more than one development stream and may e.g. have a “develop” branch that is ahead of the main stable branch. Therefore, there is very likely more code in our repositories than what this quick analysis shows.

Additional findings

As my script outputs pretty detailed statistics, it is interesting to have a quick look at e.g. how the different top-level projects and programming languages compare.

Top 3 top-level projects: Runtime, Technology & Modeling

Top-level projectPhysical SLOC

Top programming language: Java

Programming languagePhysical SLOC
ANTLR Grammar3,161,339
C/C++ Header1,019,368
Bourne Shell214,142
Gencat NLS125,251
Windows Module Definition64,843
TITAN Project File Information64,014
Bourne Again Shell11,978
DOS Batch11,675
Windows Resource File10,278
Fortran 907,211
Vuejs Component6,281
Visualforce Component5,047
MSBuild script4,538
Freemarker Template4,077
Velocity Template Language3,649
Oracle PL/SQL1,778
Objective C1,469
Visual Basic1,365
Korn Shell1,023
Objective C++589
Fortran 77588
Arduino Sketch480
Protocol Buffers454
WiX source446
JavaServer Faces440
Qt Project176
Windows Message File139
NAnt script110
Qt Linguist61
WiX include52
PO File29
C Shell7

If you end up using my script and have any question, please let me know in the comments or directly on Github!

Eclipse Kura on Steroids with UPM and Eclipse OpenJ9

So it’s been a while since the last time I blogged about a cool IoT demo… Sorry about that! On the bright side, this post covers a couple projects that are really, really, neat so hopefully, this will help you forgive me for the wait! 🙃

UP Squared Grove IoT Development Kit

At the end of last year, a new high-performance IoT developer kit was announced. Built on top of the UP Squared board, it features an Intel Apollo lake x86-64 processor, plenty of GPIOs, two Ethernet interfaces, USB 3.0 ports, an Altera MAX 10 FPGA, and more. You can get the kit from Seeed Studio for USD 249.

The UP Squared Grove IoT Development Kit

Of course, it wouldn’t be a Grove kit without the Grove shield that can be attached on top of the board to simplify the connection to a wide variety of sensors and actuators (and there’s actually a few of them in the kit).

Running Eclipse Kura on the UP Squared board

Enough with the hardware! With all this horsepower, it is of course very tempting to run Eclipse Kura on this. The UP Squared being based on an Intel x86-64 processor, it is incredibly easy to start by replacing the default OpenJDK JVM by Eclipse OpenJ9. Here’s your two-step tutorial to get Eclipse OpenJ9 and Eclipse Kura running on your board:

In case you are wondering how much faster OpenJ9 is compared to OpenJDK or Oracle’s JVMs, here’s a quick comparison of the startup time of Eclipse Kura on the UP Squared:

Eclipse Kura start-up time on Intel UP Squared Grove kit


UPM logo

UPM is a set of libraries for interacting with sensors and actuators in a cross-platform, cross-OS, language-agnostic, way.

There are over 400 sensors & actuators supported in UPM. Virtually all the “DIY” sensors you can get from SeeedStudio, Adafruit, etc. are supported, but beyond that, UPM also provides support for a wide variety of industrial sensors.

Thanks to Eclipse Kura Wires and the underlying concept of “Drivers” and “Assets”, Kura provides a way to access physical assets in a generic way.

In the next section, we will see a proof-of-concept of UPM libraries being wrapped as Kura “drivers” in order to make it really simple to interact with the 400+ kind of sensors/actuators supported by UPM.

Integrating UPM in Kura Wires

UPM drivers are small native C/C++ libraries that expose bindings in several programming languages, including Java, and therefore calling UPM drivers from Kura is pretty simple.

The only thing you need is a few JARs for UPM itself (and for MRAA, the framework that is supporting it), the JARs for the driver(s) of the particular sensor(s) you want to use, and the associated native libraries (.so files) for the above. As you may know, OSGi makes it pretty easy to package native libraries that may go alongside Java/JNI libraries, so there is really no difficulty there.

In order for the UPM drivers to be accessible from Kura Wires, and to expose “channels” corresponding to the methods available on them, they need to be bundled as Kura Drivers. This is also a pretty straightforward task, and while I created the driver for only a few sensor types out of the 400+ supported in UPM, I am pretty confident that Kura drivers can be automatically generated from UPM drivers.

You can find the final result on my Github:

See it in action!

So what do we end up getting, and why should you care? Just check out the video below and see for yourself!

Key Trends from the IoT Developer Survey 2018

Executive Summary

The IoT Developer Survey 2018 collected feedback from 502 individuals between January and March 2018.

The key findings in this year’s edition of the survey include the following:

  • Amazon AWS and Microsoft Azure are the top 2 cloud services for IoT. Google Cloud Platform is failing to get traction.
  • MQTT remains the standard of choice for IoT messaging, while AMQP is becoming more and more popular as companies scale their IoT deployments and backend systems.
  • 93% of the databases and data stores used for IoT are open source software. Data collected and used in IoT applications is incredibly diverse, from time series sensor data to device information to logs.


For the past four years, the IoT Developer Survey has been a great way to look at the IoT landscape, from understanding the key challenges for people building IoT solutions, to identifying relevant open source technology or standards.

Just like in previous years (see results from 2017, 2016 and 2015 survey), the Eclipse IoT Working Group has collaborated with a number of organizations to promote the survey to different IoT developer communities: Agile-IoT H2020 Project, IEEE, and the Open Mobile Alliance (now OMA SpecWorks).

We had a total of 502 individual responses. You will find a link to the complete report at the end of this blog post, as well as pointers to download the raw survey data.

Here are the key trends that we identified this year:

Amazon and Azure get traction, Google slips behind

For the past few years, we’ve asked people what cloud platform they use or plan on using for building their IoT solution.

IoT Developer Survey 2018: IoT Cloud Platforms Adoption – Amazon vs. Microsoft vs. Google

Since 2016, Amazon AWS has always come up as the platform of choice for the respondents, followed by Microsoft Azure and Google Cloud Platform.

📎 The use of AWS for building IoT solutions increased by 21% since 2017. 

Looking at this year’s results, there is a clear upward trend in terms of adoption for Amazon AWS (51.8%, a 21% increase from last year) and Microsoft Azure (31.21%, a 17% increase from 2017). In the meantime, Google Cloud Platform is struggling to get adoption from IoT developers (18.8%, an 8% year-to-year decrease).

📎 Google Cloud Platform struggles, with an 8% decrease in market share for IoT deployments since 2017. 

Seeing AWS ahead of the pack is no surprise. It seems to be the public cloud platform of choice for developers, according to the recent Stack Overflow Developer Survey, and one of the most loved platforms for development in general. And looking at the same survey, it seems Google is not really doing great with their Cloud Platform (it is used by 8.0% of the respondents vs. 24.1% for AWS).

IoT Developer Survey 2018: IoT Cloud Platforms Adoption – Trends

It will be interesting to see how, and if, Google catches up in the IoT cloud race, and whether we will see more acquisitions similar to Xively’s in February to help beef up their IoT offering in 2018. Since Microsoft is planning to invest $5 billion in IoT over the next four years, the IoT cloud competition will definitely be interesting to follow…

IoT Data is finally getting attention

While IoT has been around for a while now, it looks like developers are starting to realize that beyond the “cool” factor of building connected devices, the real motivation and business opportunity for IoT is in collecting data and making sense out of it.

📎 Collecting and analyzing data becomes #2 concern for #IoT developers. 

This year, 18% of the respondents identified Data Collection & Analytics as one of their top concerns for developing IoT solutions. This is a 50% increase from last year, and puts this topic as #2 concern—Security remains #1, and Connectivity is sharing the third place with Integration with Hardware.

IoT Developer Survey 2018: Key IoT Concerns

Unsurprisingly, industries such as Industrial Automation or Smart Cities tend to care about IoT data collection and analytics even more—23% of the respondents working in those industries consider data collection & analytics to be a key concern.

IoT Developer Survey 2018: Key IoT Concerns - Trends

On a side note, it is great to get the confirmation of a trend we identified last year, with Interoperability clearing becoming less of a concern for IoT developers. It’s been ranking #2 since we started doing the survey in 2015, and is now relegated to the 5th place.

As someone working with IoT open source communities on a day-to-day basis, I can’t help but think about the crucial role open standards and open source IoT platforms have had in making IoT interoperability a reality.

Consolidation in IoT messaging protocols

📎 MQTT is used in 62% of IoT solutions and remains the IoT messaging protocol of choice. 

An area I particularly like to observe year-over-year is the evolution of IoT messaging protocols. For many years now, MQTT has established itself as a protocol of choice for IoT, and this year’s survey is just confirming this: MQTT is used by over 62% of our respondents, followed by HTTP (54.1%).

Six years after IBM and Eurotech open sourced their implementations of the MQTT protocol (see the Eclipse Paho project), and with the ever-increasing popularity of the Eclipse Mosquitto project (and many other open MQTT-based projects out there of course), this is once again a demonstration that open wins. With MQTT 5 around the corner and several of the identified “limitations” of the protocol gone, MQTT will possibly become even more clearly THE IoT messaging standard in the future.

IoT Developer Survey 2018: Consolidation in IoT Messaging Protocols

It would appear that the use of HTTP is declining (54.1%), perhaps to the benefit of the more lightweight and versatile HTTP/2 (24.9% vs. 16.8% last year). XMPP (4.3%) is one of the protocols that seems to be losing the protocol consolidation battle, with a continued decline since 2016.

📎 Adoption of AMQP increased by over 30% since 2017 as people scale their IoT deployments. 

Since more and more people start scaling their IoT deployments, it is likely a reason for the significant increase in AMQP’s adoption (18.2%, from 13.9% last year), which is a core element of many IoT backends.

The use of proprietary vendor protocols and in-house protocols is steadily decreasing, confirming that the industry at large tends to favor open standards over closed solutions.

It will be interesting to watch how the adoption of DDS (4.9%) evolves over time. It already seems to be getting some traction in domains such as Automotive, where 10% of the respondents said they are using it.

IoT Developer Survey 2018: IoT Messaging Protocols – Trends

Focus on security increases

It is always interesting to watch how developers approach security in the context of IoT, and it has always been mentioned as the #1 concern for IoT developers since we started doing the survey in 2015.

However, it is no secret that security is hard, and there is unfortunately still only a limited set of security-related practices that are on the front burner of IoT developers. Communication-layer security (e.g the use of TLS or DTLS) and data encryption remain the two most popular practices, used by respectively 57.3% and 45.1% of the respondents.

IoT Developer Survey 2018: IoT Security Technologies

For the first time in the history of this survey, we explicitly asked respondents if they were using blockchain or distributed ledger technology (DLT) in their IoT solutions. I was frankly surprised to see that it would appear to be the case for 11% of the respondents. As the technology matures, and as some of the barriers making it sometimes impractical for constrained/embedded devices slowly disappear, I am expecting blockchain & DLT to be used more and more for securing IoT solutions (and probably in combination with data monetization use cases).

📎 Adoption of over-the-air updates to keep IoT applications up-to-date and secure increased by almost 50% since last year. 

To end on a positive note, it is pretty clear that developers are starting to bake security into their IoT products, as an increasing number of developers indicated they implement security techniques compared to 2017. Over-the-air updates appear to be used more and more (27.3%, a 47% increase from 2017). Open device management standards such as LWM2M, together with open source implementations such as Eclipse Wakaama and Eclipse Leshan, are certainly making it easier for developers to implement OTA in their solutions.

IoT Data is multifaceted and open source databases dominate the market

This year we added a few questions to the survey aimed at understanding better the kind of IoT data being collected, and how it is being stored.

It is interesting to see that across all industries, IoT data is equally multifaceted, and a wide variety of data is being collected by today’s IoT applications. 61.9% of the data collected is time series data (e.g sensor data), but almost equally important are device information (60.4%) and log data (54.1%). This is not really surprising as collecting sensor data is only half of the IoT operational equation: one also needs to be able to track and manage their fleet of devices.

IoT Developer Survey 2018: Types of IoT Data

Keeping that in mind, it is interesting to look at the landscape of databases and data stores used for IoT applications. While time series data is the most common form of data that IoT applications collect, traditional relational databases (namely, MySQL, with a clear leading position at 44.6%) are still widely used. It is likely reflecting the importance of storing all kinds of device metadata or legacy enterprise data in addition to sensor data.

IoT Developer Survey 2018: IoT Databases

With regards to NoSQL and time series databases, MongoDB (29.8%) and InfluxDB (15.7%) seem to be the two platforms of choice for storing non-relational IoT data (e.g time series).

📎 93% of databases used in IoT are open source. 

It is worth highlighting that an astounding majority (93%) of the databases used for IoT are open source, with Amazon DynamoDB (6.9%) being the only notable exception. With something as critical and sensitive as IoT data, it seems that solution developers tend to favor technology that is not only easy and free to access, but more importantly that allows them to really “own” their data.

Linux remains the undisputed IoT operating system

Once again, Linux (71.8%) remains the leading operating system across IoT devices, gateways, and cloud backends.

IoT Developer Survey 2018: Top IoT Operating Systems & Distros

Although Amazon’s acquisition of FreeRTOS occurred just a few months before the survey opened, it might partially explain the significant increase in its reported adoption. Going from 13% in 2016 to 20% this year, it becomes the leading embedded IoT operating system, followed by Arm Mbed (9%) and Contiki (7%).

📎 FreeRTOS becomes the leading embedded #IoT operating system, followed by Arm Mbed and Contiki OS. 

In terms of Linux distributions, and as Raspberry Pi stays a very popular platform for IoT prototyping, Raspbian (43.3%) remains the top Linux distribution followed by Ubuntu (40.2%).

IoT Developer Survey 2018: IoT Linux Distributions – Trends

You can find the complete report on Slideshare.

Should you want to play with the raw data yourself, we made it available as a Google Spreadsheet here – feel free to export it as whatever format suits you best.

Mike Milinkovich and I will be doing a webinar on Thursday, April 19, to go through the results and discuss our findings. Don’t forget to RSVP!

Thanks to everyone who took the time to fill out this survey, and thanks again to IEEE, OMA SpecWorks and the Agile-IoT project for their help with the promotion.

I am very interested in hearing your thoughts and feedback about this year’s findings in the comments of this post. And, of course, we are always open to suggestions on how to improve the survey in the future!

Eclipse, Open Source for the Internet of Things, and other random stuff