Category Archives: Eclipse

How many lines of open source code are hosted at the Eclipse Foundation?

Spoiler alert: 162 million!

That’s right, as of August 1st, there are 330 active open-source projects hosted at the Eclipse Foundation and if you look across the 1120 Git repositories that this represents, you will find over 162 million physical source lines of code. But beyond this number, let’s look at how it was obtained, and what it really means.

I’ve blogged several times about the importance of using metrics to monitor the health (and hopefully, growth!) of an open source project/community, and lines of code are just one. You should always have other metrics on your radar like the number of contributors, diversity, etc.

There are many ways, and many tools available out there, to count source lines of code. Openhub (previously known as ohloh) used to be a really good tool, but it doesn’t seem to be actively maintained. For a few years now, I’ve been relying on a home-made script to analyze Eclipse IoT projects, and it’s only recently that I realized I should probably run it against the entire eclipse.org codebase!

In this blog post, I will briefly talk about how the aforementioned script works, why you should make sure to take these metrics with a pinch of salt and finally, go through some noteworthy findings.

Line counting process

The script used to count the number of lines of code is available on Github. It takes a list of Eclipse projects’ identifiers (e.g ‘iot.paho’) and a given time range as an input and outputs a consolidated CSV file.

The main script (main.js) uses the Eclipse Project Management Infrastructure (PMI) API to retrieve the list of Git repositories for the requested projects and then proceeds to clone the repos and run the cloc command-line tool against each repo. The script also allows computing the statistics for a given time period, in which case it looks at the state of each repository at the beginning of each month for that period.

Once the main script has completed (and it can obviously take quite some time), thecsv-concat.js script can be used to consolidate all the produced metric files into one single CSV file that will contain the detailed breakout of lines of code per project and per programming language, the affiliation of the project to a particular top-level projects, the number of blanks or comment lines, etc.. It is pretty easy to then feed this CSV into Excel or Google Spreadsheets, and use it as the source for building pivot tables for specific breakouts.

Caveats

Just like virtually any KPI, you want to take the number of lines of code in your project with a grain of salt. Here are a few things to keep in mind:

All lines of code are not created equal

There is an incredible diversity of projects at Eclipse, and while a majority is using Java as their main programming language, there’s also a lot of C, C++, Python, Javascript, … 10M lines of Java code probably don’t carry the same value (i.e. how much effort has been needed to produce them) as 10M lines of C code.

Trends are more important than snapshots

It is nice to know that as of today there are 162 million lines of code in the Eclipse repositories, but it is, in my opinion, more important to look at trends over time. Is a particular programming language becoming more popular? Are all the top-level projects equally active?

I didn’t have a chance to run the scripts for a longer time period yet, but I will make sure to share the results when I get a chance!

Generated code, should it count?

There is a fair amount of generated code in some projects (in the Modeling top-level project in particular, of course), which certainly accounts for a few million lines of code. However, generated code often is customized, so I think it doesn’t necessarily skew the numbers as much as one would think.

Development does not always happen in a single branch

My script just looks at the code stored in the main (HEAD) branch of the Git repository. Some projects may have more than one development stream and may e.g. have a “develop” branch that is ahead of the main stable branch. Therefore, there is very likely more code in our repositories than what this quick analysis shows.

Additional findings

As my script outputs pretty detailed statistics, it is interesting to have a quick look at e.g. how the different top-level projects and programming languages compare.

Top 3 top-level projects: Runtime, Technology & Modeling

Top-level projectPhysical SLOC
rt54,961,728
technology28,887,621
modeling27,140,344
tools14,214,182
webtools9,651,900
eclipse6,401,518
ee4j5,809,126
ecd3,114,768
polarsys3,105,229
iot2,930,217
birt2,235,624
science1,670,051
datatools939,424
mylyn767,652
soa752,774

Top programming language: Java

Programming languagePhysical SLOC
Java72,349,870
HTML61,119,106
XML7,543,689
ANTLR Grammar3,161,339
JSON2,313,556
JavaScript2,251,418
C++2,245,759
C1,446,013
XMI1,355,914
C/C++ Header1,019,368
TTCN923,098
Maven884,271
CSS805,073
Assembly717,771
XSD688,764
PHP459,237
Python316,553
Markdown304,421
XSLT256,857
Scala229,560
Bourne Shell214,142
Go184,306
SWIG152,062
JSP142,190
Gencat NLS125,251
Ant113,133
TypeScript108,217
AsciiDoc105,552
Windows Module Definition64,843
TITAN Project File Information64,014
Groovy55,261
Sass53,915
XQuery51,432
XHTML51,166
DTD51,052
make48,021
Perl43,643
DITA42,526
yacc39,876
TeX36,400
m434,438
AspectJ33,717
Ruby28,355
Scheme27,484
YAML26,348
CMake25,182
Lua23,646
LESS18,712
SQL16,070
Cucumber15,454
IDL12,564
INI12,171
Bourne Again Shell11,978
Pascal11,915
lex11,795
DOS Batch11,675
Windows Resource File10,278
Blade8,295
C#7,983
Tcl/Tk7,611
Stylus7,477
Fortran 907,211
ERB7,048
Vuejs Component6,281
Visualforce Component5,047
MSBuild script4,538
Freemarker Template4,077
Dockerfile3,696
Velocity Template Language3,649
awk3,068
Rust2,903
Qt2,772
CUDA2,533
Puppet2,084
diff1,880
Haml1,819
Oracle PL/SQL1,778
ProGuard1,739
Objective C1,469
ActionScript1,459
Visual Basic1,365
Mathematica1,247
RobotFramework1,074
Korn Shell1,023
D1,007
Smalltalk911
R887
TOML826
Ada668
Lisp618
Objective C++589
Fortran 77588
Arduino Sketch480
MATLAB476
sed461
Protocol Buffers454
WiX source446
JavaServer Faces440
PowerShell284
Qt Project176
Windows Message File139
Expect120
NAnt script110
Smarty109
HCL78
CoffeeScript78
Skylark74
Forth69
Qt Linguist61
WiX include52
XAML49
QML48
Handlebars46
Clojure38
Prolog37
Razor32
PO File29
Haskell27
JSX24
ASP.NET21
HLSL15
F#11
Swift10
GLSL8
Kotlin7
C Shell7
Mustache1

If you end up using my script and have any question, please let me know in the comments or directly on Github!

Eclipse Kura on Steroids with UPM and Eclipse OpenJ9

So it’s been a while since the last time I blogged about a cool IoT demo… Sorry about that! On the bright side, this post covers a couple projects that are really, really, neat so hopefully, this will help you forgive me for the wait! 🙃

UP Squared Grove IoT Development Kit

At the end of last year, a new high-performance IoT developer kit was announced. Built on top of the UP Squared board, it features an Intel Apollo lake x86-64 processor, plenty of GPIOs, two Ethernet interfaces, USB 3.0 ports, an Altera MAX 10 FPGA, and more. You can get the kit from Seeed Studio for USD 249.

The UP Squared Grove IoT Development Kit

Of course, it wouldn’t be a Grove kit without the Grove shield that can be attached on top of the board to simplify the connection to a wide variety of sensors and actuators (and there’s actually a few of them in the kit).

Running Eclipse Kura on the UP Squared board

Enough with the hardware! With all this horsepower, it is of course very tempting to run Eclipse Kura on this. The UP Squared being based on an Intel x86-64 processor, it is incredibly easy to start by replacing the default OpenJDK JVM by Eclipse OpenJ9. Here’s your two-step tutorial to get Eclipse OpenJ9 and Eclipse Kura running on your board:

In case you are wondering how much faster OpenJ9 is compared to OpenJDK or Oracle’s JVMs, here’s a quick comparison of the startup time of Eclipse Kura on the UP Squared:

Eclipse Kura start-up time on Intel UP Squared Grove kit

UPM

UPM logo

UPM is a set of libraries for interacting with sensors and actuators in a cross-platform, cross-OS, language-agnostic, way.

There are over 400 sensors & actuators supported in UPM. Virtually all the “DIY” sensors you can get from SeeedStudio, Adafruit, etc. are supported, but beyond that, UPM also provides support for a wide variety of industrial sensors.

Thanks to Eclipse Kura Wires and the underlying concept of “Drivers” and “Assets”, Kura provides a way to access physical assets in a generic way.

In the next section, we will see a proof-of-concept of UPM libraries being wrapped as Kura “drivers” in order to make it really simple to interact with the 400+ kind of sensors/actuators supported by UPM.

Integrating UPM in Kura Wires

UPM drivers are small native C/C++ libraries that expose bindings in several programming languages, including Java, and therefore calling UPM drivers from Kura is pretty simple.

The only thing you need is a few JARs for UPM itself (and for MRAA, the framework that is supporting it), the JARs for the driver(s) of the particular sensor(s) you want to use, and the associated native libraries (.so files) for the above. As you may know, OSGi makes it pretty easy to package native libraries that may go alongside Java/JNI libraries, so there is really no difficulty there.

In order for the UPM drivers to be accessible from Kura Wires, and to expose “channels” corresponding to the methods available on them, they need to be bundled as Kura Drivers. This is also a pretty straightforward task, and while I created the driver for only a few sensor types out of the 400+ supported in UPM, I am pretty confident that Kura drivers can be automatically generated from UPM drivers.

You can find the final result on my Github: https://github.com/kartben/org.intellabs.upm.

See it in action!

So what do we end up getting, and why should you care? Just check out the video below and see for yourself!

Key Trends from the IoT Developer Survey 2018

Executive Summary

The IoT Developer Survey 2018 collected feedback from 502 individuals between January and March 2018.

The key findings in this year’s edition of the survey include the following:

  • Amazon AWS and Microsoft Azure are the top 2 cloud services for IoT. Google Cloud Platform is failing to get traction.
  • MQTT remains the standard of choice for IoT messaging, while AMQP is becoming more and more popular as companies scale their IoT deployments and backend systems.
  • 93% of the databases and data stores used for IoT are open source software. Data collected and used in IoT applications is incredibly diverse, from time series sensor data to device information to logs.

Introduction

For the past four years, the IoT Developer Survey has been a great way to look at the IoT landscape, from understanding the key challenges for people building IoT solutions, to identifying relevant open source technology or standards.

Just like in previous years (see results from 2017, 2016 and 2015 survey), the Eclipse IoT Working Group has collaborated with a number of organizations to promote the survey to different IoT developer communities: Agile-IoT H2020 Project, IEEE, and the Open Mobile Alliance (now OMA SpecWorks).

We had a total of 502 individual responses. You will find a link to the complete report at the end of this blog post, as well as pointers to download the raw survey data.

Here are the key trends that we identified this year:

Amazon and Azure get traction, Google slips behind

For the past few years, we’ve asked people what cloud platform they use or plan on using for building their IoT solution.

IoT Developer Survey 2018: IoT Cloud Platforms Adoption – Amazon vs. Microsoft vs. Google

Since 2016, Amazon AWS has always come up as the platform of choice for the respondents, followed by Microsoft Azure and Google Cloud Platform.

📎 The use of AWS for building IoT solutions increased by 21% since 2017. 

Looking at this year’s results, there is a clear upward trend in terms of adoption for Amazon AWS (51.8%, a 21% increase from last year) and Microsoft Azure (31.21%, a 17% increase from 2017). In the meantime, Google Cloud Platform is struggling to get adoption from IoT developers (18.8%, an 8% year-to-year decrease).

📎 Google Cloud Platform struggles, with an 8% decrease in market share for IoT deployments since 2017. 

Seeing AWS ahead of the pack is no surprise. It seems to be the public cloud platform of choice for developers, according to the recent Stack Overflow Developer Survey, and one of the most loved platforms for development in general. And looking at the same survey, it seems Google is not really doing great with their Cloud Platform (it is used by 8.0% of the respondents vs. 24.1% for AWS).

IoT Developer Survey 2018: IoT Cloud Platforms Adoption – Trends

It will be interesting to see how, and if, Google catches up in the IoT cloud race, and whether we will see more acquisitions similar to Xively’s in February to help beef up their IoT offering in 2018. Since Microsoft is planning to invest $5 billion in IoT over the next four years, the IoT cloud competition will definitely be interesting to follow…

IoT Data is finally getting attention

While IoT has been around for a while now, it looks like developers are starting to realize that beyond the “cool” factor of building connected devices, the real motivation and business opportunity for IoT is in collecting data and making sense out of it.

📎 Collecting and analyzing data becomes #2 concern for #IoT developers. 

This year, 18% of the respondents identified Data Collection & Analytics as one of their top concerns for developing IoT solutions. This is a 50% increase from last year, and puts this topic as #2 concern—Security remains #1, and Connectivity is sharing the third place with Integration with Hardware.

IoT Developer Survey 2018: Key IoT Concerns

Unsurprisingly, industries such as Industrial Automation or Smart Cities tend to care about IoT data collection and analytics even more—23% of the respondents working in those industries consider data collection & analytics to be a key concern.

IoT Developer Survey 2018: Key IoT Concerns - Trends

On a side note, it is great to get the confirmation of a trend we identified last year, with Interoperability clearing becoming less of a concern for IoT developers. It’s been ranking #2 since we started doing the survey in 2015, and is now relegated to the 5th place.

As someone working with IoT open source communities on a day-to-day basis, I can’t help but think about the crucial role open standards and open source IoT platforms have had in making IoT interoperability a reality.

Consolidation in IoT messaging protocols

📎 MQTT is used in 62% of IoT solutions and remains the IoT messaging protocol of choice. 

An area I particularly like to observe year-over-year is the evolution of IoT messaging protocols. For many years now, MQTT has established itself as a protocol of choice for IoT, and this year’s survey is just confirming this: MQTT is used by over 62% of our respondents, followed by HTTP (54.1%).

Six years after IBM and Eurotech open sourced their implementations of the MQTT protocol (see the Eclipse Paho project), and with the ever-increasing popularity of the Eclipse Mosquitto project (and many other open MQTT-based projects out there of course), this is once again a demonstration that open wins. With MQTT 5 around the corner and several of the identified “limitations” of the protocol gone, MQTT will possibly become even more clearly THE IoT messaging standard in the future.

IoT Developer Survey 2018: Consolidation in IoT Messaging Protocols

It would appear that the use of HTTP is declining (54.1%), perhaps to the benefit of the more lightweight and versatile HTTP/2 (24.9% vs. 16.8% last year). XMPP (4.3%) is one of the protocols that seems to be losing the protocol consolidation battle, with a continued decline since 2016.

📎 Adoption of AMQP increased by over 30% since 2017 as people scale their IoT deployments. 

Since more and more people start scaling their IoT deployments, it is likely a reason for the significant increase in AMQP’s adoption (18.2%, from 13.9% last year), which is a core element of many IoT backends.

The use of proprietary vendor protocols and in-house protocols is steadily decreasing, confirming that the industry at large tends to favor open standards over closed solutions.

It will be interesting to watch how the adoption of DDS (4.9%) evolves over time. It already seems to be getting some traction in domains such as Automotive, where 10% of the respondents said they are using it.

IoT Developer Survey 2018: IoT Messaging Protocols – Trends

Focus on security increases

It is always interesting to watch how developers approach security in the context of IoT, and it has always been mentioned as the #1 concern for IoT developers since we started doing the survey in 2015.

However, it is no secret that security is hard, and there is unfortunately still only a limited set of security-related practices that are on the front burner of IoT developers. Communication-layer security (e.g the use of TLS or DTLS) and data encryption remain the two most popular practices, used by respectively 57.3% and 45.1% of the respondents.

IoT Developer Survey 2018: IoT Security Technologies

For the first time in the history of this survey, we explicitly asked respondents if they were using blockchain or distributed ledger technology (DLT) in their IoT solutions. I was frankly surprised to see that it would appear to be the case for 11% of the respondents. As the technology matures, and as some of the barriers making it sometimes impractical for constrained/embedded devices slowly disappear, I am expecting blockchain & DLT to be used more and more for securing IoT solutions (and probably in combination with data monetization use cases).

📎 Adoption of over-the-air updates to keep IoT applications up-to-date and secure increased by almost 50% since last year. 

To end on a positive note, it is pretty clear that developers are starting to bake security into their IoT products, as an increasing number of developers indicated they implement security techniques compared to 2017. Over-the-air updates appear to be used more and more (27.3%, a 47% increase from 2017). Open device management standards such as LWM2M, together with open source implementations such as Eclipse Wakaama and Eclipse Leshan, are certainly making it easier for developers to implement OTA in their solutions.

IoT Data is multifaceted and open source databases dominate the market

This year we added a few questions to the survey aimed at understanding better the kind of IoT data being collected, and how it is being stored.

It is interesting to see that across all industries, IoT data is equally multifaceted, and a wide variety of data is being collected by today’s IoT applications. 61.9% of the data collected is time series data (e.g sensor data), but almost equally important are device information (60.4%) and log data (54.1%). This is not really surprising as collecting sensor data is only half of the IoT operational equation: one also needs to be able to track and manage their fleet of devices.

IoT Developer Survey 2018: Types of IoT Data

Keeping that in mind, it is interesting to look at the landscape of databases and data stores used for IoT applications. While time series data is the most common form of data that IoT applications collect, traditional relational databases (namely, MySQL, with a clear leading position at 44.6%) are still widely used. It is likely reflecting the importance of storing all kinds of device metadata or legacy enterprise data in addition to sensor data.

IoT Developer Survey 2018: IoT Databases

With regards to NoSQL and time series databases, MongoDB (29.8%) and InfluxDB (15.7%) seem to be the two platforms of choice for storing non-relational IoT data (e.g time series).

📎 93% of databases used in IoT are open source. 

It is worth highlighting that an astounding majority (93%) of the databases used for IoT are open source, with Amazon DynamoDB (6.9%) being the only notable exception. With something as critical and sensitive as IoT data, it seems that solution developers tend to favor technology that is not only easy and free to access, but more importantly that allows them to really “own” their data.

Linux remains the undisputed IoT operating system

Once again, Linux (71.8%) remains the leading operating system across IoT devices, gateways, and cloud backends.

IoT Developer Survey 2018: Top IoT Operating Systems & Distros

Although Amazon’s acquisition of FreeRTOS occurred just a few months before the survey opened, it might partially explain the significant increase in its reported adoption. Going from 13% in 2016 to 20% this year, it becomes the leading embedded IoT operating system, followed by Arm Mbed (9%) and Contiki (7%).

📎 FreeRTOS becomes the leading embedded #IoT operating system, followed by Arm Mbed and Contiki OS. 

In terms of Linux distributions, and as Raspberry Pi stays a very popular platform for IoT prototyping, Raspbian (43.3%) remains the top Linux distribution followed by Ubuntu (40.2%).

IoT Developer Survey 2018: IoT Linux Distributions – Trends


You can find the complete report on Slideshare.

Should you want to play with the raw data yourself, we made it available as a Google Spreadsheet here – feel free to export it as whatever format suits you best.

Mike Milinkovich and I will be doing a webinar on Thursday, April 19, to go through the results and discuss our findings. Don’t forget to RSVP!

Thanks to everyone who took the time to fill out this survey, and thanks again to IEEE, OMA SpecWorks and the Agile-IoT project for their help with the promotion.

I am very interested in hearing your thoughts and feedback about this year’s findings in the comments of this post. And, of course, we are always open to suggestions on how to improve the survey in the future!