[erlang-questions] Riak Search Production Configuration

Shawn Debnath shawn@REDACTED
Fri Feb 6 17:29:16 CET 2015

Whoops sorry meant for riak-users.

On 2/6/15, 8:22 AM, "Shawn Debnath" <shawn@REDACTED<mailto:shawn@REDACTED>> wrote:

I am assuming these are part of a map or some collection for each key. Riak flattens the namespace, so if you have a key KEY1 storing a map called Cyc with fields name, evironment, build; the flattened fields will look similar to Cyc.name,  Cyc.environment, Cyc.build. Your schema definition says fields will be strictly “name”, which is almost never the case. You may want to prefix the name with a ‘*’, something like:

<field name=“*name" type="_yz_str" indexed="true" stored="true" multiValued="true" required="true"/>

This is only for simple non-CRDT types. Depending on how you are storing your data, you may need to build an extractor for it. For defined CRDT types, Riak adds their own internal suffixes (_set, _counter, _flag). For CRDTs, I would recommend using the default schema and grep’ing the data files to see what the field names look like (unless someone can recommend a better way) and once you have identified the names, build a custom schema for yourself. Otherwise, use the default schema again, and your custom field definitions above but add an asterisk as the prefix and suffix: "*name*” to see what Riak thinks the full name for that field is.

Adding erlang-questions back on, hopefully someone can recommend something for AAE and or correct anything I have said above.

On 2/6/15, 7:41 AM, "Nirav Shah" <niravishah@REDACTED<mailto:niravishah@REDACTED>> wrote:

Hi Shawn,
Here are the schema i am using. If you could share some bit of advice would be great... Also, are there any specific AAE settings that is recommended for Prod systems ?


From: Shawn Debnath <shawn@REDACTED<mailto:shawn@REDACTED>>
To: Nirav Shah <niravishah@REDACTED<mailto:niravishah@REDACTED>>; Luc Perkins <lperkins@REDACTED<mailto:lperkins@REDACTED>>
Cc: "riak-users@REDACTED<mailto:riak-users@REDACTED>" <riak-users@REDACTED<mailto:riak-users@REDACTED>>
Sent: Thursday, February 5, 2015 2:14 PM
Subject: Re: Riak Search Production Configuration

The only thing I can think of is that your flattened full name is not being matched.  Also looking at Basho’s default schema, it should be “_yz_str” and not “yz_str”, and that only works if you actually have it defined as:

    <!-- YZ String: Used for non-analyzed fields -->
    <fieldType name="_yz_str" class="solr.StrField" sortMissingLast="true" />

in your schema file. The data types for solr are all class names as in solr.*.  I would double check your mappings for name, type, and if they are being indexed and stored. If stored is false, you won’t get the data back but you can query on it.

On 2/5/15, 2:01 PM, "Nirav Shah" <niravishah@REDACTED<mailto:niravishah@REDACTED>> wrote:

Hi Shawn,
I am using plain old types. Some of the fields we index ends with Id like execId, verId, orderId and are defined as long. There are some that has random strings which are defined as yz_str. Do you think this fields can cause issues ?

Surprisingly, i am seeing some data and some are not present in the Solr.  I can understand if no data is found for some field but the fact being i am seeing some data.

Also, Is there any suggestions around AAE configuration for production cluster?


From: Shawn Debnath <shawn@REDACTED<mailto:shawn@REDACTED>>
To: Nirav Shah <niravishah@REDACTED<mailto:niravishah@REDACTED>>; Luc Perkins <lperkins@REDACTED<mailto:lperkins@REDACTED>>
Cc: "riak-users@REDACTED<mailto:riak-users@REDACTED>" <riak-users@REDACTED<mailto:riak-users@REDACTED>>
Sent: Thursday, February 5, 2015 1:49 PM
Subject: Re: Riak Search Production Configuration

Nirav, are you using CRDTs or plain old types with Riak? The definition for field names makes a big difference in what gets archived and solr will not complain if it couldn’t find matching fields, it just won’t index them. You can take a peek at the data dir on the Riak instance to see what, if any, is being indexed.

On 2/5/15, 12:18 PM, "Nirav Shah" <niravishah@REDACTED<mailto:niravishah@REDACTED>> wrote:

Hi Luc,
Thanks for the response.

Here are the steps i performed on my application start up. I am using the default bucket type in my application

1. Create Custom Schema

         String schema = Source.fromInputStream(inputStream, "UTF-8").mkString(); (inputStream comes from a                 schema file that i read)

        YokozunaSchema yokozunaSchema = new YokozunaSchema("test", schema.toString());
        StoreSchema storeSchema = new StoreSchema.Builder(yokozunaSchema).build();

2.Post creation of Schema

    I created Index for my Object based on the required fields as mentioned in the Riak Search docs and other fields     as mapped to my object. I have only few fields set to be Indexed and Stored from my object as i only want to     search on them

        YokozunaIndex yokozunaIndex = new YokozunaIndex("test_idx", "test");
        StoreIndex storeIndex = new StoreIndex.Builder(yokozunaIndex).build();

3.Set Bucket Properties

    I than associate my bucket to the Index as part of the same application start up. Before i attach Index to my     bucket, i verify if it has already been attached using a Fetch
        StoreBucketProperties sbp =
                new StoreBucketProperties.Builder(namespace)


From: Luc Perkins <lperkins@REDACTED<mailto:lperkins@REDACTED>>
To: Nirav Shah <niravishah@REDACTED<mailto:niravishah@REDACTED>>
Cc: Shawn Debnath <shawn@REDACTED<mailto:shawn@REDACTED>>; "riak-users@REDACTED<mailto:riak-users@REDACTED>" <riak-users@REDACTED<mailto:riak-users@REDACTED>>
Sent: Thursday, February 5, 2015 11:04 AM
Subject: Re: Riak Search Production Configuration


Could you possibly detail the steps you used to upload the schema, adjust the bucket properties, etc.? That would help us identify the issue.


On Thu, Feb 5, 2015 at 9:42 AM, Nirav Shah <niravishah@REDACTED<mailto:niravishah@REDACTED>> wrote:

Hi Shawn,
Thanks for the response. To give you some background

1. We are using custom schema with default bucket type
2. I have the search set to on:)
3. I have associated BucketProperties/Index to my buckets..
4. What i am seeing is, i am getting data back, but for some reason i am not getting the entire set. When i query RIAK i see the data, however when i query the solr indexes, its missing that data. At this point, i don't know what can cause this  and am looking for people who might have faced similar issues.

My default config is just changing search=on in riak.conf, changed the JVM settings in riak.conf for Solr.

Would appreciate any pointers and best practice around settings for Riak Search and AAE in production cluster that i should add that folks have running in production cluster.


From: Shawn Debnath <shawn@REDACTED<mailto:shawn@REDACTED>>
To: Nirav Shah <niravishah@REDACTED<mailto:niravishah@REDACTED>>; "riak-users@REDACTED<mailto:riak-users@REDACTED>" <riak-users@REDACTED<mailto:riak-users@REDACTED>>
Sent: Thursday, February 5, 2015 9:13 AM

Subject: Re: Riak Search Production Configuration

Hi Nirav,

About your last point. Just yesterday I started playing with Search 2.0 (solr) and riak. Basho did a good job at integrating the solr platform but docs are sometimes misleading. One thing I found out was the using the default schema provided by Basho, if you are using CRDTs, your fields are suffixed with _register, _counter, _set. This link (http://docs.basho.com/riak/latest/dev/search/search-data-types/) has a good set of examples but best is to experiment. I ended up diving into the data dir of solar and grep’ed for parts of my field names to figure out what it actually was. When running queries, solr/riak will not let you know that fields are incorrect, it just doesn’t have any data for those so it returns no search results.

Good luck.


PS. Be sure to have search=on in riak.conf :)

On 2/5/15, 7:34 AM, "Nirav Shah" <niravishah@REDACTED> wrote:

Hi All,
Just wanted to check what kind of configuration settings does everyone use in production clustered environment for Riak Search/AAE and if someone can share some experience over it? We currently have a 2g memory allocated to Solr and are currently just using the default parameters from riak.conf.

What we have seen so far is that there is data in RIAK but somehow, Solr/Riak search does not return it. I am trying to find out what can cause this and am i missing some kind of configuration settings.

Any response would be appreciated.


riak-users mailing list

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20150206/ed8465bf/attachment.htm>

More information about the erlang-questions mailing list