Getting new data into MELODA web site.

Getting massive data into MELODA is not that easy as I thought.

First I tried with portals based on CKAN. Because is the most common portal for open data in Smart cities. It is easy to find portals because here there is a list of CKAN instances. Of course it is not a complete list but it works.

Thanks to Felix Ontañon and the program attached at the end of this post or on this link you can recover main data from a CKAN instance.

I first tried with Berlin and Santander open data portals, but Berlin does not work the connection with its CKAN API, and Santander finally (I am not sure), it seems it is not a CKAN.

However Opendata Caceres works fine and this file is the result. Because you recover all the formats  (rdf, csv, xls, etc) I have decided not to use all but csv, which it seems could be the most common.

Then I created a libreoffice spreadsheet to transform  the data from the csv file into a set of SQL sentences in order to get this into the main database.

So you can see the pending datasets in this link Meloda->Smart cities -> Assess a registered dataset

 

# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
#
# Authors : J. Félix Ontañón <felixonta@gmail.com>
#
###

import urllib2
import json

#CKAN_URI = “demo.ckan.org”
CKAN_URI = “datosabiertos.malaga.eu”

PACKAGE_LIST_REQ = “http://”+CKAN_URI+”/api/3/action/package_list”
PACKAGE_SHOW_REQ_BASE = “http://”+CKAN_URI+”/api/3/action/package_show?id=”
PACKAGE_ENTRY_URL_BASE = “http://”+CKAN_URI+”/dataset/”

dataset_catalogue = json.load(urllib2.urlopen(PACKAGE_LIST_REQ))

print “dataset_name, dataset_entry, resource_format, resource_url”
for dataset_name in dataset_catalogue[“result”]:
dataset = json.load(urllib2.urlopen(PACKAGE_SHOW_REQ_BASE + dataset_name))

for resource in dataset[“result”][“resources”]:
print dataset_name, ‘,’, PACKAGE_ENTRY_URL_BASE+dataset_name, ‘,’, resource[“format”], ‘,’, resource[“url”]

Leave a Reply